Abort on some large allocation requests, Panic on other #26951

bluss · 2015-07-10T21:59:17Z

Compare these two:

let v = vec![0u8; !0];
let u = vec![0u16; !0];

We request a vector with 18446744073709551615 elements.

For u8 we receive out of memory (null) from the allocator, and call alloc::oom, which aborts the program: application terminated abnormally with signal 4 (Illegal instruction)
For u16, we get an assertion: thread '<main>' panicked at 'capacity overflow', ../src/libcore/option.rs:330

This is inconsistent. Why don't we abort on both of these cases? We abort on too large allocation requests, so those that are even larger could abort too?

The text was updated successfully, but these errors were encountered:

arielb1 · 2015-07-11T12:03:20Z

@bluss

For the record, your code is the equivalent of

fn main() { let v = ::std::vec::from_elem(0u16, !0); }

In the u16 case, the amount of memory that is supposed to be allocated is 2*(2^64 - 1), which exceeds the size of usize. That is checked for rather explicitly in Vec::with_capacity. Maybe we should abort instead of panicking in that case too.

bluss · 2015-07-11T12:37:17Z

Additional info: already !0 bytes is more than we can safely allow allocating (due to known fact #22104). We should treat that as capacity overflow, but decline to do so, because it will abort anyway.

steveklabnik · 2015-07-16T18:49:12Z

/cc @rust-lang/compiler , I think?

Gankra · 2015-07-16T20:20:54Z

This is a @rust-lang/libs problem.

I am of the opinion that aborts should only occur on problems that unwinding can't address. That is, if you overflow the capacity we have to check for that, and we can panic right away. Everything will be cleaned up and the world will be happy.

However if an allocation makes it past our checks and then fails in the allocator, we have historically conservatively assumed that you are legitimately OOM and abort; unwinding can lead to arbitrary other allocations which is presumably bad to do during OOM.

In practice, the platforms we support basically can't OOM naturally (by which I mean, you're asking for the last few bits of free memory), so I reckon if you ever managed to trigger an OOM it's because you managed to request All The Memory, in which case there's plenty of slack space for unwinding. If it's a legit OOM then we'll double panic and crash anyway.

However #14674 mentions that aborting on OOM enables LLVM to better optimize around allocations. I don't know the details.

Edit: Regardless, I do not want to add any kind of "heuristic checks" for "will probably OOM" or "was probably a real OOM".

Aatch · 2015-07-17T01:50:28Z

I think that, ultimately, abort vs. unwind behaviour should be left up to the implementation, but the standard library should abort on failed allocations. This particular example is just a red herring. It's only because u8 is a single byte that it doesn't trigger any kind of arithmetic overflow behaviour. It's an interesting quirk, but doesn't really reflect any real inconsistent behaviour.

Gankra · 2015-07-17T07:10:30Z

Also to be completely clear to those not super familiar with the issue: the "odd" behaviour described in this issue is completely by design, and not a mistake. Rather this issue is positing that the behaviour should be changed to be more consistent.

bluss · 2015-07-24T21:33:53Z

Of course. I don't think it's completely by design, it's patched up to be this way. For example, if we know a capacity of > isize::MAX is always too large, why does that abort and not panic?

Gankra · 2015-07-24T23:28:36Z

@bluss any time we detect this, I believe we panic. Only alloc(..) == null aborts to my knowledge: https://github.com/rust-lang/rust/blob/master/src/liballoc/raw_vec.rs

We simply don't even check in some cases where we know any degeneracies will be caught by the allocator (e.g. when doubling capacity on 64-bit).

Gankra · 2015-07-24T23:31:48Z

Note that checking for > isize::MAX on 64-bit is almost surely useless since your system will OOM you far before that point using normal growth. If you explicit ask for > isize::MAX then you're basically arbitrarily "lucky". It's almost inconsistent to check, in that regard.

nagisa · 2015-07-24T23:47:00Z

checking for > isize::MAX on 64-bit is almost surely useless since your system will OOM you far before that point using normal growth.

That’s what they said about 32 bits timestamps and addressing too.

Gankra · 2015-07-24T23:51:38Z

@nagisa Today this is simply a hard hardware issue: you only get 48 bits of the address space.

nagisa · 2015-07-24T23:53:06Z

Yes, but it is always useful to be aware of future prospects.

Gankra · 2015-07-24T23:54:32Z

@nagisa I have thankfully architected the current design to make such an upgrade trivial. In fact, all you have to do is delete some code!

https://github.com/rust-lang/rust/blob/master/src/liballoc/raw_vec.rs#L445-L453

steveklabnik · 2015-08-04T21:54:10Z

So is this not a bug?

Gankra · 2015-08-04T22:17:40Z

It's not an accident, at least.

On Tue, Aug 4, 2015 at 2:54 PM, Steve Klabnik notifications@github.com
wrote:

So is this not a bug?

—
Reply to this email directly or view it on GitHub
#26951 (comment).

bluss · 2015-08-04T22:36:52Z

I think all conversations I've had about this have said that performance improvement from aborting instead of panic are unlikely; it saves code size in the binary with likely very little to no runtime impact. We don't have numbers.

So I reported this because I'd like to use abort() for both of the "oom" cases mentioned in the issue's description. It is sensible to do after we get a message on abort working (this is a bug).

Gankra · 2015-08-04T23:01:49Z

I think the @rust-lang/libs team just needs to make a final call on this. I think @bluss and I have both made our stances fairly clear, but to summarize (hopefully I'm not misrepresenting bluss):

Always Abort:

More consistent (always aborts when too much memory requested)
Potential theoretical perf gains to be had (llvm can optimize around straight-up ending the program)

Sometimes Abort:

Does "the most friendly thing we can" without incurring unnecessarily overhead (panics can recover -- aborts can't)

Never Abort (wild card!):

There is no such thing as a "true" oom. You will be killed by your OS first.
Destructors unlikely to allocate and will just trigger a double panic => abort if they do.

bluss · 2015-08-05T00:34:17Z

Code size may have an impact (from unrelated Servo discussion)

1:

if it takes even 5% off of our code size, that could have some significant impact on Servo's performance, particularly on embedded hardware with relatively small i-cache sizes.

alexcrichton · 2015-08-05T20:49:54Z

triage: P-medium

As an action item the libs team has decided to see if it's possible to panic on OOM and we can possibly reevaluate from there.

eddyb · 2015-08-06T06:41:43Z

Would be interesting to check is a singleton pointer value is enough to unwind without allocating.
The catching mechanism would box an "OOM happened" ZST (which doesn't actually allocate) as Box<Any> for that specific value, instead of transmuting the pointer to Box<Box<Any>>.

In the case of true OOM, it's possible that secondary allocations during unwinding (which should be rare, I can't recall ever seeing an allocating destructor) could succeed due to deallocations earlier in the unwinding process.

Gankra · 2015-08-06T15:15:37Z

@eddyb I think our main concern is libunwind allocating. A quick look seems to reveal they never check their mallocs. However maybe there's some macro shenanigans or maybe this is all utility code that isn't called by us. I didn't really dig deep into it (also maybe I checked out the wrong version...).

Gankra · 2015-08-06T15:16:16Z

In particular if Rust allocates during OOM that should be fine -- we always check and a double panic is just an abort.

sfackler · 2015-08-06T16:06:41Z

Java pre-allocates an OutOfMemory exception at initialization to ensure it can properly throw when out of memory - it might make sense to do the same thing here if possible.

steveklabnik · 2015-08-06T16:18:12Z

IIRC Python does something similar as well.

Sent from my iPhone

On Aug 6, 2015, at 12:06, Steven Fackler notifications@github.com wrote:

Java pre-allocates an OutOfMemory exception at initialization to ensure it can properly throw when out of memory - it might make sense to do the same thing here if possible.

―
Reply to this email directly or view it on GitHub.

Aatch · 2015-08-07T02:00:20Z

Preallocating might be difficult given that Rust code might not be called from inside a Rust main. I guess we could use pre-allocation for when it is called from a Rust program, and then fallback to something else for the called by foreign code case.

pnkfelix · 2015-08-07T05:15:54Z

cc me

eddyb · 2015-08-07T09:01:02Z

@Aatch the problem of preallocation seems pointless, since we control what the pointer value means, and if it points to some static in liballoc, then we can treat it as an OOM panic instead of a Box<Box<Any+Send>> value.

Gankra · 2015-08-07T18:51:52Z

Note that this is a memory safety issue for a few types: Box::new can't panic today -- this would introduce more exception-safety nonsense.

pixel27 · 2015-08-19T01:21:50Z

I don't really have an opinion on how the problem is solved, I would just recommend fixing the illegal instruction error soonest. I had bad code that was incorrectly calculating a vector length that only kicked in when I tried to run the release mode without arguments. Having the program crash repeatedly with "Illegal Instruction" had me incorrectly assuming the rust compiler was fubar and generating bad code.

Least I'm assuming it's the same issue with Rust 1.2 on a 64 bit machine calling something like:
Vec::with_capacity(33333333333333)

pythonesque · 2015-09-17T15:50:59Z

Please don't panic on OOM, especially now that catch_panic is safe.

bluss · 2015-09-17T16:26:07Z

@pixel27 I imagine a good fix would be a better way to report the issue. Maybe an output message before the abort (which I believe is an issue we have filed).

lilith · 2016-08-28T23:38:24Z

@pythonesque Why? If it's not possible to deal with OOM in Rust, I can't use Rust. See https://internals.rust-lang.org/t/could-we-support-unwinding-from-oom-at-least-for-collections/3673/21 for an exploration of why handling OOM is important.

eddyb · 2016-08-28T23:54:21Z

@nathanaeljones IMO the real problem here is that we don't have an allocator abstraction in libstd yet.

steveklabnik · 2016-08-29T00:04:06Z

@nathanaeljones the default, std allocator aborts, but given Rust's low-level nature, you could replace it with one that doesn't. It is a lot more work right now, but as @eddyb said, this is something that will come along in time.

lilith · 2016-08-29T00:41:42Z

@steveklabnik Thanks; I'm not too concerned with 'lots of work', but rather if I can produce well-behaved binaries for both FFI and server use.

I started porting Imageflow over to Rust in June, but failed to verify that OOM panics (as widely reported) instead of aborting. I'd assumed that the stabilization of std::panic::catch_unwind meant Rust was ready for my use case. I decided to write a test today and, well, panicked.

It's very unclear to me where the anti-OOM-panic sentiment arises from. Double panics abort.

I've been trying to track down blockers around this, but haven't been able to discover the reasons why movement on graceful OOM handling has been slow. I think I ?might? have found the right issue to discuss: #27700 (comment)

steveklabnik · 2019-12-25T16:07:51Z

Triage: still blocked on a libs team decision.

ehsanul · 2020-05-11T02:18:30Z

Looks like there is now an unstable alloc_error_handler feature gate. Tracking issue: #51540

Mentioned in that issue, there is also an accepted RFC to set a cargo attribute for oom=panic, tracking issue: #43596

And another issue to make the default for handle_alloc_error panic, which has survived the FCP with disposition to merge: #66741

Amanieu · 2021-09-22T21:35:03Z

We discussed this in the Libs API meeting today: the consensus was that we should pipe "invalid layout" errors through to alloc_error_handler by making it take a enum with either AllocErr and LayoutErr. This would be similar to the existing unstable TryReserveError type, and in fact we could reuse the same type in both places.

Gankra added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. I-needs-decision Issue: In need of a decision. labels Aug 4, 2015

alexcrichton added the I-nominated label Aug 5, 2015

rust-highfive added P-medium Medium priority and removed I-nominated I-needs-decision Issue: In need of a decision. labels Aug 5, 2015

alexcrichton added I-nominated P-medium Medium priority and removed P-medium Medium priority I-nominated labels Aug 5, 2015

bluss mentioned this issue Nov 16, 2015

containers should provide some way to not panic on failed allocations #29802

Closed

sourrust mentioned this issue Feb 5, 2016

Out of memory error when parsing bytes sourrust/flac#3

Closed

brson added A-allocators Area: Custom and system allocators E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. P-low Low priority I-needs-decision Issue: In need of a decision. and removed P-medium Medium priority labels Aug 4, 2016

lilith mentioned this issue Sep 2, 2016

Tracking issue for allocation APIs #27700

Closed

Mark-Simulacrum added the C-bug Category: This is a bug. label Jul 22, 2017

m-ou-se removed the I-needs-decision Issue: In need of a decision. label Sep 29, 2021

Abort on some large allocation requests, Panic on other #26951

Abort on some large allocation requests, Panic on other #26951

Comments

bluss commented Jul 10, 2015

arielb1 commented Jul 11, 2015

bluss commented Jul 11, 2015

steveklabnik commented Jul 16, 2015

Gankra commented Jul 16, 2015

Aatch commented Jul 17, 2015

Gankra commented Jul 17, 2015

bluss commented Jul 24, 2015

Gankra commented Jul 24, 2015

Gankra commented Jul 24, 2015

nagisa commented Jul 24, 2015

Gankra commented Jul 24, 2015

nagisa commented Jul 24, 2015

Gankra commented Jul 24, 2015

steveklabnik commented Aug 4, 2015

Gankra commented Aug 4, 2015

bluss commented Aug 4, 2015

Gankra commented Aug 4, 2015

bluss commented Aug 5, 2015

alexcrichton commented Aug 5, 2015

eddyb commented Aug 6, 2015

Gankra commented Aug 6, 2015

Gankra commented Aug 6, 2015

sfackler commented Aug 6, 2015

steveklabnik commented Aug 6, 2015

Aatch commented Aug 7, 2015

pnkfelix commented Aug 7, 2015

eddyb commented Aug 7, 2015

Gankra commented Aug 7, 2015

pixel27 commented Aug 19, 2015

pythonesque commented Sep 17, 2015

bluss commented Sep 17, 2015

lilith commented Aug 28, 2016

eddyb commented Aug 28, 2016

steveklabnik commented Aug 29, 2016

lilith commented Aug 29, 2016

steveklabnik commented Dec 25, 2019

ehsanul commented May 11, 2020 • edited Loading

Amanieu commented Sep 22, 2021

ehsanul commented May 11, 2020 •

edited

Loading