-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std: introduce a thread-local CSPRNG for general use #7482
Conversation
Failing test case in ReleaseSafe mode. ReleaseFast, ReleaseSmall, and Debug do not trip the segfault. pub fn main() void {}
In GDB I see:
This is why I added those
So it does look like the TLS code ran before the segfaulting line. This is also evident from strace:
|
Strangely enough, the failure seems to be caused by a95458a, although the segfault happens before that line. |
Issue fixed in f6d6709 - it was a long-standing bug with thread local storage not being properly aligned. |
} else { | ||
self.state.permute(); | ||
} | ||
mem.set(u8, self.state.toSlice()[0..std.crypto.core.Gimli.RATE], 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zeroing the state like this is.. weird?
Let's add a method to Gimli (and/or the random interface) so the caller doesn't touch the internal state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from
Lines 767 to 776 in ad05509
fn fill(r: *Random, buf: []u8) void { | |
const self = @fieldParentPtr(Gimli, "random", r); | |
if (buf.len != 0) { | |
self.state.squeeze(buf); | |
} else { | |
self.state.permute(); | |
} | |
mem.set(u8, self.state.toSlice()[0..std.crypto.core.Gimli.RATE], 0); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but don't you agree it'd be better to keep this implementation detail inside the implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
Thread-safe but not fork-safe, that's the hardest (?) part. |
I do not recognize fork-safety as a valid use case. Do you? |
Yes? OpenSSL folks didn't think that could be a problem until they were hit by a nasty CVE... |
fork-safety is important. but it could be tricky to implement it on wide range of platform. For posix+libc, it is possible to call |
Please, can someone tell me what is a use case for an application to ever call fork() without calling execve() directly after? fork() is OS developers thinking they are clever while avoiding the actual real use case of spawning a child process. |
@andrewrk it is a common way to use but I agree that the current state of art for if you prefer to not implement it, I think it should be explicitly documented and a way to force the CSPRNG reseeding should be possible (I didn't check if it is already possible). Some crypto are just completely insecure if some random is reused. |
The whole nginx architecture is organized in single-threaded processes that are repatey forked to take advantage of the multi-core processing capabilities. The idea is to avoid the cross-thread synchronization overhead by giving each process an independent copy of the process address space. |
Hi, I still have mixed feelings about using Looks like Zeroing the seed after we use it is great for forward security, however, it may not be far fetched to imagine that other parts of the application, in particular imported C code, may also use Instead of zeroing, we could overwrite it with the output of our own RNG. But does a scenario where A boring call to There are very valid cases for
Transparent and reliable While performance is important for non-cryptographically secure RNGs, preventing misuse may be more important for CSPRNGs, whose main use case is generating secret keys. The speed of a CSPRNG is rarely a bottleneck in real world applications. Plus, we have better ways to improve performance, such as using faster functions to generate the random stream. |
libressl is using it as fallback (but with lot of others inputs) for seeding random generator on Linux. But beck@ commented it as "Not as random as you think but we take what we are given". I could ask for more details if wanted.
yes, it could be a good direction to take. |
Thank you for the review, @jedisct1 and others.
This is unnecessary because on those targets this functionality simply calls
Wouldn't this be an argument for having the implementation of std.crypto.random unconditionally make a syscall to getrandom()? I think if we accept the premise of managing our own CSPRNG in application memory space, then it would allow reusing the seed for stack canaries as well as general purpose use.
In the code path where we zero or overwrite the value, it only runs when Zig controls the start code. If Zig code is embedded in an application written in a different language, this code path is not executed, and the lazy initialization with getrandom() is the active path. In the code path where zig controls the start code, we are guaranteed to be the first code path to inspect AT_RANDOM.
This patch does not use AT_RANDOM when libc is linked. It does exactly what you said, it calls getrandom at initialization time.
I do not recognize this as a valid use case. Redis is a database in which it is intended to use close to 100% of the memory available on a system as a cache. Forking to save a snapshot suddenly doubles the amount of memory that the application possibly needs, depending on end-user-controlled usage (hitting the cache). Enough cache writes, and redis gets OOM killed. It invites nondeterminism into whether your snapshot will cause a production service outage. This is a design flaw in Redis and it caused a production outage when I worked at OkCupid. However,
This is enough for me. I agree.
That sounds fine, although I will want to call the raw version for std.ChildProcess which does not need random data between fork() and execve(). Now that you have convinced me fork safety is something that we want to achieve, we have quite a few tricks up our sleeves. @semarie pointed out a possible solution for posix+libc:
For systems with arc4random the problem is solved. For Windows, can we agree we do not need fork safety? Implementing fork() on Windows is possible but it is quite a hack. For systems that don't have pthread_atfork and need libc:
There are not many OS's left after this flow chart. I can also offer my own argument for why to ignore AT_RANDOM and rely exclusively on getrandom(): it's useless startup code that does not need to run when an application does not use std.crypto.random. Maybe the lazy init cost is a better tradeoff. |
The usual trick is to use |
std.crypto.random * cross platform, even freestanding * can't fail. on initialization for some systems requires calling os.getrandom(), in which case there are rare but theoretically possible errors. The code panics in these cases, however the application may choose to override the default seed function and then handle the failure another way. * thread-safe * supports the full Random interface * cryptographically secure * no syscall required to initialize on Linux (AT_RANDOM) * calls arc4random on systems that support it `std.crypto.randomBytes` is removed in favor of `std.crypto.random.bytes`. I moved some of the Random implementations into their own files in the interest of organization. stage2 no longer requires passing a RNG; instead it uses this API. Closes #6704
* get rid of the pointless fences * make seed_len 16 instead of 32, which is accurate since it was already padding the rest anyway; now we do 1 pad instead of 2. * secureZero to clear the AT_RANDOM auxval * add a flag root source files can use to disable the start code. This is in case people want to opt out of the initialization when they don't depend on it.
I'd be all for this. It would not only reduce the code size, but also avoid being limited to a 128 bit seed. |
@LemonBoy thanks for the tip, will have a new commit for you to review momentarily. On BSD/Darwin we don't need |
Everybody gets what they want! * AT_RANDOM is completely ignored. * On Linux, MADV_WIPEONFORK is used to provide fork safety. * On pthread systems, `pthread_atfork` is used to provide fork safety. * For systems that do not have the capability to provide fork safety, the implementation falls back to calling getrandom() every time. * If madvise is unavailable or returns an error, or pthread_atfork fails for whatever reason, it falls back to calling getrandom() every time. * Applications may choose to opt-out of fork safety. * Applications may choose to opt-in to unconditionally calling getrandom() for every call to std.crypto.random.fillFn. * Added `std.meta.globalOption`. * Added `std.os.madvise` and related bits. * Bumped up the size of the main thread TLS buffer. See the comment there for justification. * Simpler hot path in TLS initialization.
Everybody gets what they want! const std = @import("std");
pub fn main() !void {
std.debug.print("Before fork, random = {d}\n", .{std.crypto.random.int(i32)});
const pid = try std.os.fork();
std.debug.print("pid = {d}, random = {d}\n", .{ pid, std.crypto.random.int(i32) });
} fork safety demonstrated on linux 5.4.77, no libc.strace reveals the MADV_WIPEONFORK:
fork safety demonstrated on linux 4.13, with libcmadvise did not have MADV_WIPEONFORK yet. pthread_atfork gets called here:
fork safety demonstrated on linux 4.13, without libc.No safety is available so it always calls getrandom().
fork safety explicitly disabled by the application:const std = @import("std");
pub const crypto_fork_safety = false;
pub fn main() !void {
std.debug.print("Before fork, random = {d}\n", .{std.crypto.random.int(i32)});
const pid = try std.os.fork();
std.debug.print("pid = {d}, random = {d}\n", .{ pid, std.crypto.random.int(i32) });
}
Explicitly request to always make the OS syscall:const std = @import("std");
pub const crypto_always_getrandom = true;
pub fn main() !void {
std.debug.print("Before fork, random = {d}\n", .{std.crypto.random.int(i32)});
const pid = try std.os.fork();
std.debug.print("pid = {d}, random = {d}\n", .{ pid, std.crypto.random.int(i32) });
}
|
That bcrypt issue is tripping on both linux (with -lc) and freebsd:
The pub fn secureZero(comptime T: type, s: []T) void {
const ptr = @ptrCast([*]volatile u8, s.ptr);
const length = s.len * @sizeOf(T);
@memset(ptr, 0, length);
} Relevant bcrypt code is: var password_buf: [73]u8 = undefined;
// ...
utils.secureZero(u8, &password_buf); It's also inconsistent. It happens or does not happen depending on how many other tests are run. If I isolate the bcrypt test it works fine. Edit: I double checked the LLVM IR output for secureZero: ; Function Attrs: nobuiltin nounwind
define internal fastcc void @crypto.utils.secureZero(%"[]u8"* nonnull readonly align 8 %0) unnamed_addr #1 !dbg !288689 {
Entry:
%ptr = alloca i8*, align 8
%length = alloca i64, align 8
call void @llvm.dbg.declare(metadata %"[]u8"* %0, metadata !288691, metadata !DIExpression()), !dbg !288695
%1 = getelementptr inbounds %"[]u8", %"[]u8"* %0, i32 0, i32 0, !dbg !288696
%2 = load i8*, i8** %1, align 8, !dbg !288696
store i8* %2, i8** %ptr, align 8, !dbg !288697
call void @llvm.dbg.declare(metadata i8** %ptr, metadata !288692, metadata !DIExpression()), !dbg !288698
%3 = getelementptr inbounds %"[]u8", %"[]u8"* %0, i32 0, i32 1, !dbg !288699
%4 = load i64, i64* %3, align 8, !dbg !288699
%5 = mul nuw i64 %4, 1, !dbg !288700
store i64 %5, i64* %length, align 8, !dbg !288700
call void @llvm.dbg.declare(metadata i64* %length, metadata !288694, metadata !DIExpression()), !dbg !288701
%6 = load i8*, i8** %ptr, align 8, !dbg !288702
%7 = load i64, i64* %length, align 8, !dbg !288703
call void @llvm.memset.p0i8.i64(i8* align 4096 %6, i8 0, i64 %7, i1 true), !dbg !288704
ret void, !dbg !288705
} It's calling Edit 2: I think I have it figured out, there was another long standing alignment bug that this is exposing. fn childAtForkHandler() callconv(.C) void {
const wipe_slice = @ptrCast([*]u8, &wipe_me)[0..@sizeOf(@TypeOf(wipe_me))];
std.crypto.utils.secureZero(u8, wipe_slice);
} This is the new code from the PR. And so secureZero for all |
std.crypto.random
os.getrandom(), in which case there are rare but theoretically
possible errors. The code panics in these cases, however the
application may choose to override the default seed function and then
handle the failure another way.
std.crypto.randomBytes
is removed in favor ofstd.crypto.random.bytes
.I moved some of the Random implementations into their own files in the
interest of organization.
stage2 no longer requires passing a RNG; instead it uses this API.
Closes #6704
cc @jedisct1