Default hashmap implementation can cause processes to block #32953

derekstraka · 2016-04-14T14:31:00Z

When the /dev/urandom pool has not yet been initialized, the use of the default HashMap can cause a process to block. The behavior is problematic when a program is started by an initscript and is not intended to run in the background. The process will block until the urandom pool is initialized which on some embedded systems may be several minutes.

A concrete example of the issue in docopt/docopt.rs#178, but the issue can also occur in other places like regex.

derekstraka · 2016-04-14T14:31:12Z

Should the crates use a non-default hasher or should the default implementation not block when the random pool isn't initialized?

cardoe · 2016-04-14T14:41:13Z

/cc @BurntSushi

BurntSushi · 2016-04-14T14:56:30Z

cc @rust-lang/libs

alexcrichton · 2016-04-14T22:21:27Z

Interesting! One possible thing we could do is make the default hashing algorithm configurable, but that also seems pretty unfortunate in terms of how it may be implemented.

Another possibility would be to build on a strategy like #31356 except also provide a function to manually seed a thread. That way in cases like this a thread could be manually seeded and all future hash maps on that thread would use the same seed.

Exposing seeding though would be exposing the internals of using siphash, so that may not be that great either...

cardoe · 2016-04-15T02:05:51Z

So I believe the the reason that Rust uses SipHash is to avoid the denial of service attack against many hash table implementations since SipHash was designed to be resistant against it given a good seed. However that's really for unbounded data. My understanding is that if the data set is bounded a "weaker" hasher is not dangerous to a DoS. Obviously a second opinion is very welcome here.

In that case it would then make sense to allow different crates to chose their hashing algorithm if they know their data set is bounded. Obviously that adds some complexity by exposing options that people could get wrong in a bad way but the default should be SipHash, not sure if that's enough to be safe.

bluss · 2016-04-15T12:55:33Z

like @derekstraka points out, getrandom can tell you when it would block, so libstd could detect that and fall back to a different seed but it will inevitably be a weaker random key for the hash function then.

cardoe · 2016-04-15T13:29:09Z

Right. That would have to get exposed via the rand crate which currently has no mechanism to do so. OsRng just wires up to the best implementation available.

BurntSushi · 2016-04-15T13:40:22Z

What should we do in the mean time? Does it make sense to start moving away from the default sip hasher in crates like regex? (i.e., preferring either fnv or BTreeMap?)

bluss · 2016-04-15T13:52:09Z

@cardoe the relevant code is actually in this source tree, see libstd/sys/unix/rand.rs

jcreekmore · 2016-04-15T15:11:16Z

So, I would suggest migrating to something like FNV for crates that really don't need the DoS resiliency that SipHash provides (such as the regex crate). Extending SipHash (and ultimately rand) to allow using a weaker seed value would still require migrating the other crates to using this extended SipHash mode so you have additional changes for very little benefit.

The biggest issue is how to communicate when you should use one hasher versus another inside of a crate. On the one hand, it is great that HashMap is proactive and providing a hasher that protects against hash-flooding denial of service attacks by default, but it is unfortunate that, by default, all users of HashMap are paying a performance penalty when many (most?) of them do not need that protection.

cardoe · 2016-04-15T15:19:40Z

So I'll say that I checked systemd which is obviously used at early boot and their hash implementation uses SipHash and they also use getrandom() to seed it like Rust does. But they call getrandom() with the non-block flag set which avoids the problem.

bluss · 2016-04-15T15:45:13Z

@Cardo you left out what they do if the getrandom call errors, that's the solution part 😉

systemd's code is here

So this was news to me, that on a system where getrandom may block since there is not yet enough "entropy", a read to urandom will still succeed and not block. I don't think we can produce a better fallback / weak randomness than simply using the same methodology. (Lwn on getrandom / urandom)

I think that HashMap::new() using a randomly keyed hash function, HashMaps being ubiquitous, and Rust wanting to be as portable as possible, all this supports randomly keyed hash functions to be a best-effort approach. We can't require all crates that use HashMaps to change to use FNV. Besides, more uses of fnv is just a net reduction in denial of service protection.

(We wouldn't do it in a best-effort approach if there was an explicit API named “create a HashMap with a strongly random hash function”. Our API is called HashMap::new()).

cardoe · 2016-04-15T16:28:19Z

Right my point is they seed their hasher in a best effort case by getting the best possible data they can get in a non-blocking fashion by using a fallback. I didn't expand on how they did it.

I don't think @jcreekmore advocated changing all crates to FNV. He's advocating giving crates the option to change to FNV if they know their dataset is bounded and therefore they know they aren't vulnerable and can use a faster hasher function since FNV is faster than SipHash even in the random data is available immediately case.

Maybe leave HashMap::new() but add HashMap::new_with_hasher(h: Hasher)

bluss · 2016-04-15T16:30:18Z

Using fnv with HashMap is a stable feature since Rust 1.7 (link)

BurntSushi · 2016-04-15T17:36:31Z

Yeah, I feel like I could be convinced to go tweak this to get past the bad behavior for now, but I feel like "don't use the default std::collections::HashMap if you don't want your program to block for some unspecified amount of time" is not a particularly nice thing. I think that means I'm sympathetic to @bluss's idea, but that also feels like a pretty big change that warrants more visibility.

alexcrichton · 2016-04-15T18:47:30Z

Oh wow, thanks for the tip @cardoe, and the code @bluss. Sounds like we should just implement that strategy in the standard library for now? I agree that we shouldn't stop recommending HashMap as a result of this, we need to figure out a solution!

@jcreekmore it's definitely true that some crates don't need this DoS protection, but one of the arguments for doing it by default is that a surprising number of crates might. For example you may write a crate that's not considering consuming user input, but them it may be pulled into someone else's application where they do so.

@BurntSushi I agree for now if you want to immediately work around this I'd just recommend using the FNV hashing for hash maps (or even btrees). Hopefully we can solve this in the standard library, however, so it's only a temporary solution.

If we attempt a read with getrandom() on Linux the syscall can block before the random pool is initialized unless the GRND_NONBLOCK flag is passed. This flag causes getrandom() to instead return EAGAIN while the pool is uninitialized. To avoid downstream users of crate or std functionality that have no ability to avoid this blocking behavior this change causes Rust to read bytes from /dev/urandom while getrandom() would block and once getrandom() is available to use that. Fixes rust-lang#32953. Signed-off-by: Doug Goldstein <cardoe@cardoe.com>

rand: don't block before random pool is initialized If we attempt a read with getrandom() on Linux the syscall can block before the random pool is initialized unless the GRND_NONBLOCK flag is passed. This flag causes getrandom() to instead return EAGAIN while the pool is uninitialized. To avoid downstream users of crate or std functionality that have no ability to avoid this blocking behavior this change causes Rust to read bytes from /dev/urandom while getrandom() would block and once getrandom() is available to use that. Fixes #32953. Signed-off-by: Doug Goldstein <cardoe@cardoe.com>

BurntSushi added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Apr 14, 2016

cardoe mentioned this issue Apr 18, 2016

rand: don't block before random pool is initialized #33086

Merged

bors closed this as completed in #33086 May 6, 2016

sfackler mentioned this issue Nov 10, 2017

Use getrandom syscall for all Linux and Android targets. #45896

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default hashmap implementation can cause processes to block #32953

Default hashmap implementation can cause processes to block #32953

derekstraka commented Apr 14, 2016

derekstraka commented Apr 14, 2016

cardoe commented Apr 14, 2016

BurntSushi commented Apr 14, 2016

alexcrichton commented Apr 14, 2016

cardoe commented Apr 15, 2016

bluss commented Apr 15, 2016

cardoe commented Apr 15, 2016

BurntSushi commented Apr 15, 2016

bluss commented Apr 15, 2016

jcreekmore commented Apr 15, 2016

cardoe commented Apr 15, 2016

bluss commented Apr 15, 2016 •

edited

Loading

cardoe commented Apr 15, 2016

bluss commented Apr 15, 2016 •

edited

Loading

BurntSushi commented Apr 15, 2016 •

edited

Loading

alexcrichton commented Apr 15, 2016

Default hashmap implementation can cause processes to block #32953

Default hashmap implementation can cause processes to block #32953

Comments

derekstraka commented Apr 14, 2016

derekstraka commented Apr 14, 2016

cardoe commented Apr 14, 2016

BurntSushi commented Apr 14, 2016

alexcrichton commented Apr 14, 2016

cardoe commented Apr 15, 2016

bluss commented Apr 15, 2016

cardoe commented Apr 15, 2016

BurntSushi commented Apr 15, 2016

bluss commented Apr 15, 2016

jcreekmore commented Apr 15, 2016

cardoe commented Apr 15, 2016

bluss commented Apr 15, 2016 • edited Loading

cardoe commented Apr 15, 2016

bluss commented Apr 15, 2016 • edited Loading

BurntSushi commented Apr 15, 2016 • edited Loading

alexcrichton commented Apr 15, 2016

bluss commented Apr 15, 2016 •

edited

Loading

bluss commented Apr 15, 2016 •

edited

Loading

BurntSushi commented Apr 15, 2016 •

edited

Loading