Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious "filename too long" failure on bots #33844

Closed
alexcrichton opened this issue May 24, 2016 · 13 comments
Closed

Spurious "filename too long" failure on bots #33844

alexcrichton opened this issue May 24, 2016 · 13 comments
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) O-windows Operating system: Windows

Comments

@alexcrichton
Copy link
Member

http://buildbot.rust-lang.org/builders/auto-win-gnu-32-opt-rustbuild/builds/1204
https://gist.github.com/alexcrichton/e72877d88345cac07278c6449cc343e8

failures:

---- sync::atomic::AtomicU8::load_0 stdout ----
    thread 'sync::atomic::AtomicU8::load_0' panicked at 'couldn't run the test: The filename or extension is too long. (os error 206)', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustdoc\test.rs:304
note: Run with `RUST_BACKTRACE=1` for a backtrace.

---- sync::atomic::AtomicUsize::fetch_xor_0 stdout ----
    thread 'sync::atomic::AtomicUsize::fetch_xor_0' panicked at 'couldn't run the test: The filename or extension is too long. (os error 206)', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustdoc\test.rs:304

---- sync::atomic_1 stdout ----
    error: could not exec the linker `gcc`: The filename or extension is too long. (os error 206)
thread 'sync::atomic_1' panicked at 'Box<Any>', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustc\session/mod.rs:169
thread 'sync::atomic_1' panicked at 'couldn't compile the test', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustdoc\test.rs:281

---- sync::atomic::AtomicUsize::load_0 stdout ----
    thread 'sync::atomic::AtomicUsize::load_0' panicked at 'couldn't run the test: The filename or extension is too long. (os error 206)', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustdoc\test.rs:304

---- unimplemented!_0 stdout ----
    error: could not exec the linker `gcc`: The filename or extension is too long. (os error 206)
thread 'unimplemented!_0' panicked at 'Box<Any>', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustc\session/mod.rs:169
thread 'unimplemented!_0' panicked at 'couldn't compile the test', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustdoc\test.rs:281

---- sync::atomic::AtomicUsize::store_0 stdout ----
    thread 'sync::atomic::AtomicUsize::store_0' panicked at 'couldn't run the test: The filename or extension is too long. (os error 206)', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustdoc\test.rs:304

---- try!_0 stdout ----
    error: could not exec the linker `gcc`: The filename or extension is too long. (os error 206)
thread 'try!_0' panicked at 'Box<Any>', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustc\session/mod.rs:169
thread 'try!_0' panicked at 'couldn't compile the test', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustdoc\test.rs:281

---- unimplemented!_1 stdout ----
    error: could not exec the linker `gcc`: The filename or extension is too long. (os error 206)
thread 'unimplemented!_1' panicked at 'Box<Any>', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustc\session/mod.rs:169
thread 'unimplemented!_1' panicked at 'couldn't compile the test', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustdoc\test.rs:281

---- sync::atomic_0 stdout ----
    error: could not exec the linker `gcc`: The filename or extension is too long. (os error 206)
thread 'sync::atomic_0' panicked at 'Box<Any>', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustc\session/mod.rs:169
thread 'sync::atomic_0' panicked at 'couldn't compile the test', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustdoc\test.rs:281

---- unreachable!_0 stdout ----
    error: could not exec the linker `gcc`: The filename or extension is too long. (os error 206)
thread 'unreachable!_0' panicked at 'Box<Any>', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustc\session/mod.rs:169
thread 'unreachable!_0' panicked at 'couldn't compile the test', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustdoc\test.rs:281

---- sync::atomic::AtomicUsize::swap_0 stdout ----
    thread 'sync::atomic::AtomicUsize::swap_0' panicked at 'couldn't run the test: The filename or extension is too long. (os error 206)', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustdoc\test.rs:304

---- unreachable!_1 stdout ----
    error: could not exec the linker `gcc`: The filename or extension is too long. (os error 206)
thread 'unreachable!_1' panicked at 'Box<Any>', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustc\session/mod.rs:169
thread 'unreachable!_1' panicked at 'couldn't compile the test', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustdoc\test.rs:281

---- write!_0 stdout ----
    error: could not exec the linker `gcc`: The filename or extension is too long. (os error 206)
thread 'write!_0' panicked at 'Box<Any>', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustc\session/mod.rs:169
thread 'write!_0' panicked at 'couldn't compile the test', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustdoc\test.rs:281

---- writeln!_0 stdout ----
    error: could not exec the linker `gcc`: The filename or extension is too long. (os error 206)
thread 'writeln!_0' panicked at 'Box<Any>', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustc\session/mod.rs:169
thread 'writeln!_0' panicked at 'couldn't compile the test', C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\src\librustdoc\test.rs:281


failures:
    sync::atomic::AtomicU8::load_0
    sync::atomic::AtomicUsize::fetch_xor_0
    sync::atomic::AtomicUsize::load_0
    sync::atomic::AtomicUsize::store_0
    sync::atomic::AtomicUsize::swap_0
    sync::atomic_0
    sync::atomic_1
    try!_0
    unimplemented!_0
    unimplemented!_1
    unreachable!_0
    unreachable!_1
    write!_0
    writeln!_0

test result: FAILED. 927 passed; 14 failed; 9 ignored; 0 measured

Weird!

@alexcrichton alexcrichton added O-windows Operating system: Windows A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) labels May 24, 2016
@retep998
Copy link
Member

This error can occur when the gcc command is over 32,767 characters. It would be nice if there was some way to check the length of the lpCommandLine that is passed to CreateProcessW when spawning a Command so a better diagnostic can be emitted. At the very least print out the linker arguments when that error code occurs.

@alexcrichton
Copy link
Member Author

Failure appears to always be the auto-win-gnu-32-opt-rustbuild builder so far, probably related to how cargo passes a few extra arguments.

retep998 added a commit to retep998/rust that referenced this issue May 31, 2016
Needed to diagnose rust-lang#33844

Signed-off-by: Peter Atashian <retep998@gmail.com>
bors added a commit that referenced this issue Jun 1, 2016
Attempt to diagnose #33844

#33844 is a spurious failure that causes builds to fail due to the linker command sometimes failing with error 206, which means that the command is too long. This PR makes rustc print out the linker arguments in that case so the reason for it being so long can be diagnosed and hopefully fixed.

r? @alexcrichton
@xen0n
Copy link
Contributor

xen0n commented Jun 1, 2016

Hi everyone, what about moving to the so-called response files? (stackoverflow, msdn, gcc) I programmed in Windows not so long ago and it's how Visual Studio has always been doing to workaround the argv limit. Apart from having to deal with exactly one temp file it'll solve the problem once and for all. Maybe I could even take this and produce a patch.

Opinions?

@retep998
Copy link
Member

retep998 commented Jun 1, 2016

The correct link for link.exe is https://msdn.microsoft.com/en-us/library/9xch38h8.aspx
This is definitely a valid approach to take, although we really should investigate why the command line is so long in the first place to see if there is something accidentally inflating it.

@xen0n
Copy link
Contributor

xen0n commented Jun 1, 2016

@retep998 That link was simply copy'n'pasted from the SO link, thanks for the correction. I agree that the problematic invocations should be investigated first because it's only occasionally showing up.

@alexcrichton
Copy link
Member Author

The captured instance of this shows the command line as not very long, so my guess is that it's actually related to the environment variable block. This can also be only up to 32K.

The compiler will attempt to update the PATH environment variable when loading dylibs in a fashion that with concurrent tests in rustdoc it's possible for PATH to just keep growing indefinitely. That way once you start hitting the final tests you'll probably start seeing failures with the environment block too big.

Specifically, I suspect that this block needs to be changed. I believe we've run into this before in #17327 which led to #17360, so I'm gonna close this in favor of #17360.

@vadimcn
Copy link
Contributor

vadimcn commented Jun 3, 2016

The compiler will attempt to update the PATH environment variable when loading dylibs in a fashion that with concurrent tests in rustdoc it's possible for PATH to just keep growing indefinitely.

I am kinda skeptical of this. It'd take a lot of threads entering this block at the same time to add the PATH up to 32k.
As far as I understand, the main danger of raciness in this code is that another thread may restore the original value of PATH between the current thread adding to it and calling LoadLibrary. But in that case the error would be something along the lines of "DLL not found".

Also, if PATH is growing too long, wouldn't the error be happening in SetEnvironmentVariable rather than in CreateProcess?

@retep998
Copy link
Member

retep998 commented Jun 3, 2016

@vadimcn The error is happening as a result of Command::output which means potentially any system function called inside there, not necessarily CreateProcessW.

@retep998
Copy link
Member

retep998 commented Jun 3, 2016

Also, on the matter of the environment block being too big...

The ANSI version of this function, CreateProcessA fails if the total size of the environment block for the process exceeds 32,767 characters. MSDN

Since we're calling CreateProcessW and not CreateProcessA, that shouldn't be the issue, probably. Then again MSDN could be lying to us. Or maybe with the unicode version while there is no limit on the block itself, there could still be a limit on the individual environment variables themselves. If so, the proper solution is to get rid of the raciness and just thread PATH through somehow.

@xen0n
Copy link
Contributor

xen0n commented Jun 3, 2016

@retep998 I just confirmed a limit of ~32K on at least individual environment variables indeed exists:

use std::process::{Command, Output};


fn test(n: usize) -> std::io::Result<Output> {
    // just about anything works, I simply used a hello world
    let mut c = Command::new("hello.exe");

    let path = {
        let mut parts: Vec<u8> = Vec::with_capacity(n + 3);  // "C:\\AAA...A"
        parts.push(67);
        parts.push(58);
        parts.push(92);
        for i in 0..n {
            parts.push(65);
        }
        // parts.push(0);
        String::from_utf8(parts).unwrap()
    };
    c.env("PATH", path);

    c.output()
}


fn main() {
    let mut l = 1usize;
    let mut r = 1usize;

    loop {
        let ret = test(r);
        if let Err(_) = ret {
            break;
        }

        l = r;
        r = 2 * r;
    }

    loop {
        let mid = (l + r) / 2;
        let ret = test(mid);
        println!("left={} right={} mid={} ({:?})", l, r, mid, ret);
        if let Err(_) = ret {
            r = mid;
        } else {
            l = mid;
        }
        if r - l <= 1 {
            break;
        }
    }
}

This gives a value close to 32K on my Windows 7 VM. So MSDN actually is lying 😺 In fact they're simply not being accurate there, as all the ANSI variants of Windows API are just shims for the wide ones. EDIT: sorry this is just testing only one variable, but it's already sufficient. I tested again with separate variables of constant length, and the program was happily eating all my memory. So the limitation should be on sizes of individual variables.

@alexcrichton
Copy link
Member Author

@vadimcn it looks like so far it's almost always been doc tests in libcore that are failing (that's just from memory, not verified). The most recent failure indicates that libcore has 954 tests which means that so long as each test adds ~30 bytes to PATH it'll overflow eventually.

In our case get_dylib_search_paths will probably return paths looking like:

C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\obj\build\x86_64-pc-windows-msvc\stage1\lib\rustlib\x86_64-pc-windows-msvc\lib
C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\obj\build\x86_64-pc-windows-msvc\stage1-std\release\x86_64-pc-windows-msvc
C:\bot\slave\auto-win-gnu-32-opt-rustbuild\build\obj\build\x86_64-pc-windows-msvc\stage1-std\release\x86_64-pc-windows-msvc\deps

All of those together are 381 bytes which means of our 954 tests only about 84 have to fail this race (< 10%). The race here, that is, where because threads just append to PATH they append to an already lengthened PATH, and then restore the longer path (causing further threads to lengthen it).

We don't actually call SetEnvironmentVariable I believe, this is all just threaded through to CreateProcessW, and @xen0n's investigation seems to confirm that super long values will cause explosions.

Plus, this diagnosis fits the bill of the symptoms so far:

  • We know for sure it's not the command line.
  • It only happens in rustdoc tests, which are the only ones exercising this race
  • It rarely happens externally of our own build system, but it's also very rare to find a crate with 954 doc tests always tested on Windows
  • This happens sporadically, so it seems racy
  • The error coincides with the theory
  • After one test fails all others start failing as well as each new append would keep hitting the same limit.

@alexcrichton
Copy link
Member Author

Ah and this also only happens on WIndows, which is the only spot with that logic active.

@xen0n
Copy link
Contributor

xen0n commented Jun 5, 2016

Did further investigation and confirmed:

  • Other environment variables are not affected, only PATH.
  • The size limit is on the length of value, not on individual path entries. Even an overlong series of C:\;C:\;C:\; ... C:\ is triggering the error.

I'm not bothering to check implementation of CreateProcessW (in ReactOS though) for certainty, as it seems this is the only problem.

alexcrichton added a commit to alexcrichton/rust that referenced this issue Jun 5, 2016
This commit attempts to bring our prepends to PATH on Windows when loading
plugins because we've been seeing quite a few issues with failing to spawn a
process on Windows, the leading theory of which is that PATH is too large as a
result of this. Currently this is mostly a stab in the dark as it's not
confirmed to actually fix the problem, but it's probably not a bad change to
have anyway!

cc rust-lang#33844
Closes rust-lang#17360
bors added a commit that referenced this issue Jun 8, 2016
rustc: Try to contain prepends to PATH

This commit attempts to bring our prepends to PATH on Windows when loading
plugins because we've been seeing quite a few issues with failing to spawn a
process on Windows, the leading theory of which is that PATH is too large as a
result of this. Currently this is mostly a stab in the dark as it's not
confirmed to actually fix the problem, but it's probably not a bad change to
have anyway!

cc #33844
Closes #17360
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) O-windows Operating system: Windows
Projects
None yet
Development

No branches or pull requests

4 participants