Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random appveyor x86 build failures #792

Closed
jagerman opened this issue Apr 9, 2017 · 21 comments
Closed

Random appveyor x86 build failures #792

jagerman opened this issue Apr 9, 2017 · 21 comments

Comments

@jagerman
Copy link
Member

jagerman commented Apr 9, 2017

I'm seeing occassional random build failures on the appveyor x86 builds, always during the linking stage. I have a feeling we're running into memory limitations, perhaps combined with the addition of the /m flag for parallel building and the new multi-CPU abilities. (I haven't yet seen a failure for exactly the same builds on https://ci.appveyor.com/project/jagerman/pybind11). If this persists, perhaps we should turn off the /m flag? (I'm assuming--though may be wrong--that it's default to 2 cores for the main project builds and 1 for my open-source-free account)

@jagerman
Copy link
Member Author

jagerman commented Apr 9, 2017

Another option: we could disable eigen tests on the 2 appveyor x86 builds; in practice it seems to be the most resource heavy test script.

@jagerman
Copy link
Member Author

jagerman commented Apr 9, 2017

Example: pre-merge build and post-merge build, with no changes to master between the PR and the merge.

@wjakob
Copy link
Member

wjakob commented Apr 9, 2017

Removing the /m flag sounds good to me, that doesn't really seem to have helped in any way.

@jagerman
Copy link
Member Author

jagerman commented Apr 9, 2017

I removed it; we'll see if it helps.

@jagerman
Copy link
Member Author

jagerman commented Apr 9, 2017

Another failure without /m (this time on jagerman not wjakob): https://ci.appveyor.com/project/jagerman/pybind11/build/1.0.529

And this strange one: https://ci.appveyor.com/project/wjakob/pybind11/build/1.0.1648/job/jrflasp8e5tsr95l

I have a feeling that there are some appveyor x86 issues.

@jagerman
Copy link
Member Author

And another, which worked here.

@jagerman
Copy link
Member Author

Now every PR is triggering it, even on x64. Um...

@wjakob
Copy link
Member

wjakob commented Apr 10, 2017

Huh? I don't see that here.

@jagerman
Copy link
Member Author

Okay, not every; but it did show up on the first and second x64 builds, so I guess this isn't purely x86-specific.

I'm trying to reproduce it in a Win 10 VM, watching the compiler and linker memory usage; no issues so far (max memory usage slightly over 500MB) , but I had only the VS 2017 RC installed. I'm updating now to investigate some more.

@wjakob
Copy link
Member

wjakob commented Apr 11, 2017

@jagerman
Copy link
Member Author

@jagerman
Copy link
Member Author

I got an RDP connection to the appveyor VM immediately after a build failure had occured on it, issued the build manually, and it completed successfully without error. I couldn't find anything in the logs to suggest a problem.

(For future reference, changing the on_failure to:

on_failure:
  - ps: $blockRdp = $true; iex ((new-object net.webclient).DownloadString('https://raw.githubusercontent.com/appveyor/ci/master/scripts/enable-rdp.ps1'))

will block it and print RDP connection info in the build log, but the VM only stays alive for 1hr).

@wjakob
Copy link
Member

wjakob commented Apr 11, 2017

The RDP feature is neat, but I don't think sticking it into on_failure is a good idea because it will block all AppVeyor builds for 1 hour.

@wjakob
Copy link
Member

wjakob commented Apr 11, 2017

I suspect that there is some kind of MSVC ICE Heisenbug. When cl.exe crashes with an ICE, the panic message is generally not picked up by AppVeyor (likely due to the custom Appveyor.MSBuildLogger.dll solution)

@jagerman
Copy link
Member Author

Oh, for sure; I wasn't suggesting we add it, just making a note for how to use it.

@jagerman
Copy link
Member Author

From the appveyor forum bug report:

I have a feeling that "Command exited with code -1" error might be something related to VS updates as 15.1 is not the latest version, but 15.1.1 was recently released: https://www.visualstudio.com/en-us/news/releasenotes/vs2017-relnotes

We are going to roll out image update with 15.1.1 in the coming days - will see if that fixes the issue.

@jagerman
Copy link
Member Author

I disabled fast_finish (b4cbd7a) at least for now, so that the random failures don't prevent other jobs from running.

@jagerman
Copy link
Member Author

Confirmation -- it's not just us!

@jagerman
Copy link
Member Author

Appveyor believes this is fixed now, as of about an hour ago; I'll close this, but reopen if the exited with -1 comes up again!

@jagerman
Copy link
Member Author

Appveyor believes this is fixed with the most recent update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants