Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reading plugin stderr read |0: file already closed #116

Closed
skriss opened this issue May 2, 2019 · 10 comments · Fixed by #118
Closed

reading plugin stderr read |0: file already closed #116

skriss opened this issue May 2, 2019 · 10 comments · Fixed by #118

Comments

@skriss
Copy link
Contributor

skriss commented May 2, 2019

I'm seeing intermittent errors being logged from go-plugin with a message of "reading plugin stderr", and an error of "read |0: file already closed". This seems to happen when plugins are terminated.

In looking at https://golang.org/src/os/exec/exec.go?s=17633:17682#L601, and client.go, it seems like there may be a race condition where the reader side of the pipe is closed after the command exits, but the go-plugin client is still trying to read from it. The client's logStderr is looking for an io.EOF error, but in this case I think it's actually getting some variant of the os.ErrClosed error. I'm not familiar enough with the code to understand exactly what's going on/what should be changed, though. Hope this can point someone in the right direction!

@skriss
Copy link
Contributor Author

skriss commented May 6, 2019

Anyone had a chance to look at this? cc @jbardin

@jbardin
Copy link
Member

jbardin commented May 6, 2019

Hi @skriss,

I haven't seen this myself yet, and it does appear that Wait should not be called until the logger is done reading. It's not a fatal error (we're only logging it, there's no action taken), but is it causing a problem for you in other ways?

Do you have a way to replicate the error, or any other log messages from the go-plugin package?

@skriss
Copy link
Contributor Author

skriss commented May 6, 2019

@jbardin thanks for taking a look. It's not causing any functional issues other than writing a bunch of errors to the log.

It's pretty easily replicated in our project which makes use of go-plugin, though as I said it's sporadic, which implies to me it's probably a race condition. There aren't any other useful logs emitted around the same time. Let me see if I can come up with a simpler repro for you that doesn't involve deploying a Kubernetes cluster :)

@skriss
Copy link
Contributor Author

skriss commented May 6, 2019

My thought was that the switch statement at https://github.com/hashicorp/go-plugin/blob/master/client.go#L947 should actually be checking for an os.ErrClosed instead of an io.EOF -- looks like in a previous version of the code, only the write end of the pipe being used was closed, which resulted in an io.EOF being returned, but now, with the use of os/exec's StderrPipe, the read end is closed, so you get an os.ErrClosed (well, I think it's actually a wrapped version of that).

@jbardin
Copy link
Member

jbardin commented May 6, 2019

Yes, that switch is where we would need to add the check for os.ErrClosed, but only if that's actually an error we should be expecting there (and it's more likely a PathError we'd have to check for first).

I would like to at least understand why we're seeing this before ignoring the error completely however. A repro would be great, but even a description of how you're configuring and using the plugin might help.

abhinavdahiya added a commit to abhinavdahiya/terraform-provider-ignition that referenced this issue May 13, 2019
…een invocations

With errors like issue 116 [1], terraform restarts the provider server. And with terraform turning
on `AutoMTLS` for all plugins [2], the client cannot reattach to already running provider server [3].

Provider's memory only caches creates errors like `invalid file \"49e2cc7ba023d87070d9b49a0c8f02ff0f63b7f61b390fbce8f64b011bd42ed7\", unknown file id`
because the ignition_config resource tried to retrived a file id stored in cache before provider server was restarted.

This stores the objects in `diskv` store in users' cache directory, retireving it when the memory cache is not populated.
The in-memory cache keeps providing the no overhead access to objects.

[1]: hashicorp/go-plugin#116
[2]: hashicorp/terraform#19560
[3]: https://github.com/hashicorp/go-plugin/blob/5692942914bbdbc03558fde936b1f0bc2af365be/client.go#L203-L204
abhinavdahiya added a commit to abhinavdahiya/terraform-provider-ignition that referenced this issue May 13, 2019
…een invocations

With errors like issue 116 [1], terraform restarts the provider server. And with terraform turning
on `AutoMTLS` for all plugins [2], the client cannot reattach to already running provider server [3].

Provider's memory only caches creates errors like `invalid file \"49e2cc7ba023d87070d9b49a0c8f02ff0f63b7f61b390fbce8f64b011bd42ed7\", unknown file id`
because the ignition_config resource tried to retrived a file id stored in cache before provider server was restarted.

This stores the objects in `diskv` store in users' cache directory, retireving it when the memory cache is not populated.
The in-memory cache keeps providing the no overhead access to objects.

[1]: hashicorp/go-plugin#116
[2]: hashicorp/terraform#19560
[3]: https://github.com/hashicorp/go-plugin/blob/5692942914bbdbc03558fde936b1f0bc2af365be/client.go#L203-L204
abhinavdahiya added a commit to abhinavdahiya/terraform-provider-ignition that referenced this issue May 13, 2019
…een invocations

With errors like issue 116 [1], terraform restarts the provider server. And with terraform turning
on `AutoMTLS` for all plugins [2], the client cannot reattach to already running provider server [3].

Provider's memory only caches creates errors like `invalid file \"49e2cc7ba023d87070d9b49a0c8f02ff0f63b7f61b390fbce8f64b011bd42ed7\", unknown file id`
because the ignition_config resource tried to retrived a file id stored in cache before provider server was restarted.

This stores the objects in `diskv` store in users' cache directory, retireving it when the memory cache is not populated.
The in-memory cache keeps providing the no overhead access to objects.

[1]: hashicorp/go-plugin#116
[2]: hashicorp/terraform#19560
[3]: https://github.com/hashicorp/go-plugin/blob/5692942914bbdbc03558fde936b1f0bc2af365be/client.go#L203-L204
@skriss
Copy link
Contributor Author

skriss commented May 20, 2019

@abhinavdahiya are you seeing this error as well?

@skriss
Copy link
Contributor Author

skriss commented May 20, 2019

@jbardin it looks like if you add a time.Sleep(time.Second) just above https://github.com/hashicorp/go-plugin/blob/master/client.go#L946, then you should be able to pretty easily reproduce this. It looks like there's nothing preventing https://github.com/hashicorp/go-plugin/blob/master/client.go#L606 from executing before logStderr has finished reading everything and gotten an io.EOF, and if .Wait() gets called before this happens, the read half of the stderr pipe gets closed by Wait(), then the next read in logStderr will get the "file already closed" error.

I implemented a super-simple fix in the following branch, that seems to prevent the issue from happening (and hopefully clearly demonstrates the race): https://github.com/hashicorp/go-plugin/compare/master...skriss:fix-stderr-race?expand=1. Happy to submit it as a PR if you like it as-is, otherwise I can change the approach as needed.

@jbardin
Copy link
Member

jbardin commented May 20, 2019

Oh yes, thanks for tracking that down! The logger might not have entered the Read before Wait, and could be executing in any part of that loop when the process exits. While the error here isn't critical, it would be best to have the termination steps properly ordered for correctness.

I think the solution here seems reasonable. I would just suggest that we re-organize the code here a bit so that the .Add(1) call is local to where the goroutine is dispatched. It doesn't look like that would be too hard in ht this case.

@skriss
Copy link
Contributor Author

skriss commented May 20, 2019

I would just suggest that we re-organize the code here a bit so that the .Add(1) call is local to where the goroutine is dispatched. It doesn't look like that would be too hard in ht this case.

This makes sense -- the only issue I see here is that we want to call .Add(1) before starting the goroutine that calls cmd.Wait(), whereas currently go c.logStderr(cmdStderr) happens after. Is there any issue with moving the block that runs go c.logStderr(cmdStderr) to just above the goroutine that calls cmd.Wait()? It looks like it should be OK.

@jbardin
Copy link
Member

jbardin commented May 21, 2019

@skriss: yes, that's the minor reorganization I had in mind. The order the goroutines are declared shouldn't matter, as long as the waitgroup is setup correctly and we're not changing anything in the closure.

nikhilsaraf added a commit to nikhilsaraf/kelp that referenced this issue Feb 28, 2020
nikhilsaraf added a commit to stellar-deprecated/kelp that referenced this issue Mar 2, 2020
…#386)

* 1 - attempt to fix stream closing issues in kelpos.Blocking()

* 2 - only close manually created pipes, not Std* pipes

* 3 - the actual fix, Wait() after Read()

see: hashicorp/go-plugin#116 (comment)

* 4 - remove IPC server and components since that is causing an explosion of pipes/files and is also unused

closes #385
Jonareid added a commit to Jonareid/kelp that referenced this issue Sep 15, 2021
… (#386)

* 1 - attempt to fix stream closing issues in kelpos.Blocking()

* 2 - only close manually created pipes, not Std* pipes

* 3 - the actual fix, Wait() after Read()

see: hashicorp/go-plugin#116 (comment)

* 4 - remove IPC server and components since that is causing an explosion of pipes/files and is also unused

closes #385
seniordev-1 added a commit to seniordev-1/kelp that referenced this issue Dec 15, 2021
… (#386)

* 1 - attempt to fix stream closing issues in kelpos.Blocking()

* 2 - only close manually created pipes, not Std* pipes

* 3 - the actual fix, Wait() after Read()

see: hashicorp/go-plugin#116 (comment)

* 4 - remove IPC server and components since that is causing an explosion of pipes/files and is also unused

closes #385
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants