Unchecked vector pre-allocation #151

dbrgn · 2017-11-21T12:54:01Z

When playing around with afl, I found the following example:

extern crate rmpv;

use std::io::Cursor;

fn main() {
    let data = [219, 175, 142, 142, 201, 219, 128, 0, 50, 175, 142, 196, 100, 212, 185];
    let mut cursor = Cursor::new(data);
    let decoded = rmpv::decode::value::read_value(&mut cursor);
    println!("Done: {:?}", decoded);
}

It takes almost 200ms in release mode and 2min 47s in debug mode to run.

dbrgn · 2017-11-21T13:51:30Z

Note that – at least in debug mode – the CPU usage is at 100% on one core and memory consumption is steadily increasing to a few gigabytes.

dbrgn · 2017-11-21T14:03:56Z

Sorry for the triple post. If I understand the msgpack spec correctly, this is because the data indicates that a raw str buffer with 2.8 GiB of data is following.

The decoder should probably abort after having reached the end of the input data, right? Or is this an issue with the Cursor?

3Hren · 2017-11-21T17:41:07Z

Yep, there is vector pre-allocation to fit the specified size (which is 2945355465 bytes) before actual reading. This is definitely a bug.

stusmall · 2019-03-06T16:13:49Z

I took a quick look at fixing this. Looks like it should be easy to fix once rust-lang/rust#48043 stabilizes

dbrgn · 2019-03-06T17:05:47Z

The question is whether we should even try to allocate that much memory, or whether a heuristic should be used to check whether there will actually be data in the input buffer to fill that memory...

E.g. if the remaining input data (in a non-streaming parser) is smaller than the requested buffer size, then something's probably wrong.

huitseeker · 2019-08-27T20:28:56Z

This issue came up in the recent audit of the dalek libraries by Quarkslab.

Is it worth opening a RustSec report?

dbrgn · 2019-08-27T22:50:43Z

Let's ask @tarcieri 🙂 It does have potential for denial-of-service.

tarcieri · 2019-08-28T04:36:53Z

This particular vulnerability has come up enough times in my life I'd appreciate an advisory for it 😉

Shnatsel · 2019-10-01T16:43:40Z

We should probably request a Clippy lint for this kind of bug. It's a fairly common pattern and is very easy to detect, although you cannot be 100% sure it's a bug without taint analysis.

dbrgn · 2019-10-01T16:49:46Z

@Shnatsel how would that work? Vec::with_capacity(param) certainly has its uses.

Shnatsel · 2019-10-01T17:05:13Z

Where param is not a constant and greater than u16, warn that it may exhaust memory. Sadly this will lead to false positives so we can only make this a lint that's not enabled by default.

https://github.com/facebookexperimental/MIRAI should be able to detect such cases much more reliably.

dbrgn · 2019-10-01T19:32:32Z

Ok, but in this case the param is not constant, it's part of the data. The problem isn't even mainly the memory exhaustion, it's the fact that 2 GiB are allocated even when the serialized data is only 20 bytes long.

@3Hren do you think aborting with an error when [requested allocation size > total (or remaining) serialized data size] is a good fix for this?

3Hren · 2019-10-02T06:02:49Z

I think this would be enough.

dbrgn · 2019-10-02T07:26:21Z

Here's another minimal example:

let buf: &[u8] = &[
  // bin32 type
  0xc6,
  // 0xffffffff bytes (4096 GiB) of data will follow
  0xff, 0xff, 0xff, 0xff,
  // only 1 byte of data
  0x00,
];
read_value(&mut &buf[..]).unwrap();

Regarding the fix:

@3Hren do you think aborting with an error when [requested allocation size > total (or remaining) serialized data size] is a good fix for this?

I think this would be enough.

I tried to fix it, but then realized that the rmpv library always deals with R: Read types, which means that we do not know the length of the available data in advance.

I see two possibilities for an "easy" fix:

Only preallocate vectors up to a certain arbitrary size like 10 MiB (e.g. Vec::with_capacity(min(len, 10_485_760)))
Don't preallocate vectors at all

The first option would still have the advantage that all messages smaller than that limit don't require vector resizing, but I don't like the arbitrary limit that much. Limit could also be higher (e.g. 100 MiB), but it's always arbitrary.

Any opinions?

kornelski · 2019-10-02T09:42:59Z

Limiting capacity to some arbitrary size is OK IMHO, as it doesn't affect correctness in any way, and it's only a performance optimization.

Is it possible to amplify this attack by using something like an array to allocate n oversized vecs, or does it work only at the end of the stream?

skrap · 2019-10-02T11:47:31Z

Could this be a premature optimization? Is there any measurable performance gain from preallocating? If not, perhaps just let Vec grow, with whatever algorithm it chooses, as new bytes from Read become available. This issue goes away in that case.

dbrgn · 2019-10-02T15:04:02Z

Is it possible to amplify this attack by using something like an array to allocate n oversized vecs, or does it work only at the end of the stream?

Yeah, that might be possible. Good point.

Could this be a premature optimization? Is there any measurable performance gain from preallocating?

I wondered the same and wanted to do a benchmark, but it seems that the current benchmarks in the codebase don't work (usage of undefined variables and other things like that) so I didn't bother with that so far.

I agree that we should probably just stop preallocating any vectors if we have no way of knowing the following data. The Rust compiler already deals with properly optimized algorithms to grow vectors.

dbrgn · 2019-10-02T15:12:03Z

Note: As an optimization there could also be a parallel API that takes a byte slice instead of a R: Read type, since then we could detect this problem by taking a look at the remaining length of the slice. Then we could pre-allocate safely (of course only if this brings any reasonable performance benefits).

kornelski · 2019-10-03T12:44:05Z

Is it possible to amplify this attack by using something like an array to allocate n oversized vecs, or does it work only at the end of the stream?

Yeah, that might be possible.

Do you have an example, or could describe how it can be done?

dbrgn · 2019-10-03T13:34:14Z

@kornelski this is the spec for arrays with up to 2**32-1 entries:

array 32 stores an array whose length is upto (2^32)-1 elements:
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~~+
|  0xdd  |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|    N objects    |
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~~+

Note that the following things are all untested.

If we write 0xdd 0x00 0x0f 0x42 0x40 ... (not sure if I got the endianness correct) that would specify an array of 1 million elements.

While the type of the elements is not known in advance (IIRC), it will take at least a byte per element. I guess that's why rmpv preallocates 1 million bytes (1 MiB) of RAM:

fn read_array_data<R: Read>(rd: &mut R, mut len: usize) -> Result<Vec<Value>, Error> {
    let mut vec = Vec::with_capacity(len);
    ...

Now the library will iterate over the values. If the first element in that array is also an array with 1 million entries, then that will result in another Vec of ~1 MiB being allocated (so 2 in total).

This can be nested. I'm not sure if there's a max recursion depth, but of course if you nest 100 arrays that all allocate 100 MiB of space, then that will already require 10 GiB of RAM, without any single vector exceeding that 100 MiB size.

It's not exponential, since it will error as soon as the parser notices that there's not actually sufficient data to fill all those arrays. But it's still possible to do a DoS even if we add a preallocation limit.

I'll create a PR that replaces the with_capacity calls with new. If someone has a better idea, we can still add another fix later on. But I don't see how preallocating without any knowledge of the remaining data size will work safely for untrusted input.

Edit: With this example:

#[test]
fn invalid_buf_size_arr() {
    // This invalid buffer requests a nested array of depth 10.
    // All arrays contain the maximum possible number of elements.
    // If a byte is preallocated for every array content,
    // that would require 40 GiB of RAM.
    let buf: &[u8] = &[
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
    ];
    let result = read_value(&mut &buf[..]);
}

...the code SIGABRTs at runtime with the message memory allocation of 240518168520 bytes failed

Fixes 3Hren#151 / RUSTSEC-2017-0006

daboross · 2019-10-03T16:28:42Z

I agree that we should probably just stop preallocating any vectors if we have no way of knowing the following data. The Rust compiler already deals with properly optimized algorithms to grow vectors.

It has properly optimized algorithms, but the best algorithms still involve log_2(n) allocations.

I'd still think that pre-allocating under some limit is better than not pre-allocating at all. If we pre-allocate under some arbitrary size, then arrays over that limit decode slower than arrays under it. But code with pre-allocation, even with an arbitrary cap, will always be faster than code without it?

serde's Deserialize implementations for Vec, HashSet, etc. cap out pre-allocation at 4096 elements. Perhaps we could do the same?

Fixes 3Hren#151.

Fixes 3Hren#151 / RUSTSEC-2017-0006

kornelski · 2019-10-09T20:07:51Z

Fix released in rmpv v0.4.2

dbrgn changed the title ~~Slow decoding~~ Unchecked vector pre-allocation Mar 8, 2018

stusmall mentioned this issue Feb 28, 2019

Malformed series of bytes can lock up system or cause OOM #191

Closed

dbrgn mentioned this issue Oct 1, 2019

Add advisory for DoS vulnerability in rmpv rustsec/advisory-db#171

Merged

dbrgn added a commit to dbrgn/msgpack-rust that referenced this issue Oct 3, 2019

Fix unchecked vector pre-allocation

eed631a

Fixes 3Hren#151 / RUSTSEC-2017-0006

dbrgn mentioned this issue Oct 3, 2019

Fix unchecked vector pre-allocation #221

Merged

dbrgn added a commit to dbrgn/msgpack-rust that referenced this issue Oct 8, 2019

Fix unchecked vector pre-allocation

f6d4671

Fixes 3Hren#151.

dbrgn added a commit to dbrgn/msgpack-rust that referenced this issue Oct 8, 2019

Fix unchecked vector pre-allocation

02856a8

Fixes 3Hren#151 / RUSTSEC-2017-0006

dbrgn added a commit to dbrgn/msgpack-rust that referenced this issue Oct 8, 2019

Fix unchecked vector pre-allocation

8559aee

Fixes 3Hren#151 / RUSTSEC-2017-0006

kornelski closed this as completed in #221 Oct 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unchecked vector pre-allocation #151

Unchecked vector pre-allocation #151

dbrgn commented Nov 21, 2017 •

edited

Loading

dbrgn commented Nov 21, 2017 •

edited

Loading

dbrgn commented Nov 21, 2017

3Hren commented Nov 21, 2017 •

edited

Loading

stusmall commented Mar 6, 2019

dbrgn commented Mar 6, 2019

huitseeker commented Aug 27, 2019

dbrgn commented Aug 27, 2019

tarcieri commented Aug 28, 2019

Shnatsel commented Oct 1, 2019 •

edited

Loading

dbrgn commented Oct 1, 2019

Shnatsel commented Oct 1, 2019

dbrgn commented Oct 1, 2019

3Hren commented Oct 2, 2019

dbrgn commented Oct 2, 2019 •

edited

Loading

kornelski commented Oct 2, 2019

skrap commented Oct 2, 2019

dbrgn commented Oct 2, 2019

dbrgn commented Oct 2, 2019

kornelski commented Oct 3, 2019

dbrgn commented Oct 3, 2019 •

edited

Loading

daboross commented Oct 3, 2019 •

edited

Loading

kornelski commented Oct 9, 2019

Unchecked vector pre-allocation #151

Unchecked vector pre-allocation #151

Comments

dbrgn commented Nov 21, 2017 • edited Loading

dbrgn commented Nov 21, 2017 • edited Loading

dbrgn commented Nov 21, 2017

3Hren commented Nov 21, 2017 • edited Loading

stusmall commented Mar 6, 2019

dbrgn commented Mar 6, 2019

huitseeker commented Aug 27, 2019

dbrgn commented Aug 27, 2019

tarcieri commented Aug 28, 2019

Shnatsel commented Oct 1, 2019 • edited Loading

dbrgn commented Oct 1, 2019

Shnatsel commented Oct 1, 2019

dbrgn commented Oct 1, 2019

3Hren commented Oct 2, 2019

dbrgn commented Oct 2, 2019 • edited Loading

kornelski commented Oct 2, 2019

skrap commented Oct 2, 2019

dbrgn commented Oct 2, 2019

dbrgn commented Oct 2, 2019

kornelski commented Oct 3, 2019

dbrgn commented Oct 3, 2019 • edited Loading

daboross commented Oct 3, 2019 • edited Loading

kornelski commented Oct 9, 2019

dbrgn commented Nov 21, 2017 •

edited

Loading

dbrgn commented Nov 21, 2017 •

edited

Loading

3Hren commented Nov 21, 2017 •

edited

Loading

Shnatsel commented Oct 1, 2019 •

edited

Loading

dbrgn commented Oct 2, 2019 •

edited

Loading

dbrgn commented Oct 3, 2019 •

edited

Loading

daboross commented Oct 3, 2019 •

edited

Loading