-
-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bash-engine: Check stdout readable before getline #211
Conversation
Pull SimonKagstrom#193 (written to resolve SimonKagstrom#192) caused the mbland/go-script-bash Travis build to fail. That build clones SimonKagstrom/kcov each time, and after that pull request, kcov began hanging after printing the number of Bats test cases on Linux, e.g.: https://travis-ci.org/mbland/go-script-bash/jobs/246933167 This change resolves the issue by calling `file_readable` on standard output before calling `getline`. However, I am still working to write a focused test case, both to understand the exact problem and to prevent a regression.
Hmm, there's something decidedly funky about that failed Linux Clang build. Maybe just restart that one task? Also, the reason the coverage went down slightly is because the new |
And, duh, an
This is where it started to hang. So a buffer was getting filled, but it wasn't stdout. I'm actually going to update the last commit to include this message; guess it'll kick off some new builds as well. |
Per the previous commit, this reproduces the problem caused by SimonKagstrom#193 whereby multiple calls to `getline` within the `handleStdout` function in `src/engines/bash-engine.cc` would cause kcov to hang indefinitely. The problem was due to the fact that the buffer to which Bash was writing kcov tracing messages was getting full, so Bash would eventually block on `write` calls to the `xtraceFd` (FD 782). After that point, calls to `getline` to read the standard output from Bash would block as well, creating a deadlock between the kcov parent and Bash child processes. Using a build of kcov from before the previous commit, running an `strace` command similar to the following revealed that the xtraceFd (FD 782) buffer was getting full: ``` $ strace -f build/src/kcov coverage_dir tests/bash/handle-all-output.sh [ ...snip ... ] [pid 30363] <... read resumed> "550\n", 4096) = 4 [pid 30363] write(1, "550\n", 4550 ) = 4 [pid 30363] read(5, <unfinished ...> [pid 30366] <... write resumed> ) = 4 [pid 30366] write(782, "kcov@handle-all-output.sh@3@(( +"..., 38) = 38 [pid 30366] write(782, "kcov@handle-all-output.sh@3@(( i"..., 44) = 44 [pid 30366] write(782, "kcov@handle-all-output.sh@4@echo"..., 37 ``` At this point the kcov (pid 30363) and Bash (pid 30366) processes hang indefinitely. The new `bash_handle_all_output` test case reliably reproduces the failure and validates the fix from the previous commit. It required adding a timeout capability to `KcovTestCase.do`. I've also removed `@unittest.expectedFailure` from `bash_subshell`, as it isn't necessary and confirms that the desired behavior implemented by SimonKagstrom#193 remains intact.
Codecov Report
@@ Coverage Diff @@
## master #211 +/- ##
==========================================
- Coverage 62.69% 62.68% -0.02%
==========================================
Files 56 56
Lines 3699 3698 -1
==========================================
- Hits 2319 2318 -1
Misses 1380 1380
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #211 +/- ##
==========================================
- Coverage 62.69% 62.68% -0.02%
==========================================
Files 56 56
Lines 3699 3698 -1
==========================================
- Hits 2319 2318 -1
Misses 1380 1380
Continue to review full report at Codecov.
|
Wow, there's something seriously strange going on with that Linux Clang build. Wonder if it has to do with the Trusty update noted just before the logs:
|
The Clang issue is probably caused by me using clang-3.9 from a PPA. In the system-mode-ng branch, I've changed a few things around this, so the clang builds should work again once that's merged. Don't worry about that problem. Anyway, thanks for the fix! Also compliments for the very detailed commit message and bug report! |
My pleasure! Thanks for merging, and for such a great tool! (I mean, c'mon, code coverage for Bash—who knew?) 😄 |
I noticed during the build for #227 that test coverage dropped precipitously for no apparent reason: https://coveralls.io/builds/14508796 The coverage dropped because SimonKagstrom/kcov@0dd22bd added the ability for kcov to automatically discover all scripts residing in the same directory of the script under test. In this case, it unfortunately discovered all the scripts in ./git and tests/kcov, as well as various *.md and *.yml files that happened to have "bash" in the first line regardless of the presence of a #!. The interim fix for this was to add the --bash-dont-parse-binary-dir and to update the tests accordingly. Also, since a bit of time elapsed since I returned to the problem, I realized the latest tip of kcov's master branch exhibited the hanging bug I helped squash in SimonKagstrom/kcov#211. The bug was reintroduced in SimonKagstrom/kcov@ad17136, which tried to fix a bug reported in SimonKagstrom/kcov#248 whereby kcov would hang on output that didn't contain newlines. SimonKagstrom/kcov#249 attempted to resolve the bug, but didn't quite work and was abandoned. I may take a stab at fixing it one day, but in the meanwhile, kcov v35 with --bash-dont-parse-binary-dir works to get this project's test coverage back in shape.
Pull #193 (written to resolve #192) caused the mbland/go-script-bash Travis build to fail. That build clones SimonKagstrom/kcov each time, and after that pull request, kcov began hanging after printing the number of Bats test cases on Linux, e.g.:
This change resolves the issue by calling
file_readable
on standard output before callinggetline
.Though I don't have a perfect understanding yet, it appears that Bash scripts that run for some amount of time and produce some amount of standard output eventually end up blocking when trying to write. Previously, when the parent
kcov
process would subsequently block ingetline
, the child process running the Bash script somehow wasn't able to resume, implying some sort of deadlock.At any rate, the new
bash_handle_all_output
test case reliably reproduces the failure and validates the fix. It required adding a timeout capability toKcovTestCase.do
.I've also removed
@unittest.expectedFailure
frombash_subshell
, as it isn't necessary and it confirms that the behavior implemented by #193 remains intact.