Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conmon: drop usage of splice(2) #131

Merged
merged 3 commits into from
Mar 20, 2020

Conversation

giuseppe
Copy link
Member

splice(2) doesn't seem to handle correctly sockets having a type of SOCK_SEQPACKET.

When using a SOCK_SEQPACKET, each read is limited to the current packet. The data that doesn't fit in a single read(2) call will be discarded. This can be a problem when the fd_out doesn't accept all the data, so on a short write we lose the data that could not be written by the splice(2) call.

Replace splice(2) with splitting the read/write loop.

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
splice(2) doesn't seem to handle correctly sockets having a type of
SOCK_SEQPACKET.

When using a SOCK_SEQPACKET, each read is limited to the current
packet.  The data that doesn't fit in a single read(2) call will be
discarded.  This can be a problem when the fd_out doesn't accept all
the data, so on a short write we lose the data that could not be
written by the splice(2) call.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
src/conmon.c Outdated Show resolved Hide resolved
src/conmon.c Outdated Show resolved Hide resolved
src/conmon.c Outdated Show resolved Hide resolved
@haircommander
Copy link
Collaborator

good find!

@giuseppe
Copy link
Member Author

fixed the comments and pushed a new version

Comment on lines +712 to +714
if (w < 0) {
nwarn("Failed to write to container stdin");
} else {
Copy link
Collaborator

@haircommander haircommander Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stylistically, this is inconsistent with line 727 WRT the braces on a one line if statement. I don't mind which, but we should probably choose one for the project 😄

src/conmon.c Outdated Show resolved Hide resolved

/* There is still data in the buffer. */
if (sock->remaining) {
sock->data_ready = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. Above has_data is capital, but here is lower case. I vote we choose one and stick to it

@haircommander
Copy link
Collaborator

sorry, a couple more nits, then LGTM

@haircommander
Copy link
Collaborator

uh @cevich

Failed to start: Error getting access token for service account: 400 Bad Request {"error":"invalid_grant","error_description":"Invalid grant: account not found"}
Failed to stop: Error getting access token for service account: 400 Bad Request {"error":"invalid_grant","error_description":"Invalid grant: account not found"}

PTAL

@mheon
Copy link
Member

mheon commented Mar 18, 2020

Are we thinking this is the cause of our exec bug? If so, it's actually amazing that it only happens with exec - this definitely looks like a case that run would hit as well. Regardless, great catch @giuseppe

@haircommander
Copy link
Collaborator

@giuseppe is this still a draft?

@giuseppe
Copy link
Member Author

Are we thinking this is the cause of our exec bug? If so, it's actually amazing that it only happens with exec - this definitely looks like a case that run would hit as well. Regardless, great catch @giuseppe

I could easily reproduce the issue you've reported locally and this patch solves it for me. Do we use a different buffer size in Podman for run? The 32kb buffer size used for exec is relevant to the reproducibility of the issue.

@giuseppe giuseppe marked this pull request as ready for review March 18, 2020 19:00
@giuseppe
Copy link
Member Author

@giuseppe is this still a draft?

marked ready for review

@haircommander
Copy link
Collaborator

haircommander commented Mar 18, 2020

LGTM (test failures are not the fault of this PR)

@haircommander
Copy link
Collaborator

haircommander commented Mar 18, 2020

you may need to repush to trigger the tests

jk they're not fixed yet

now that we dropped splice(2), we need to make sure the read/write
loop doesn't hang indefinitely conmon.

Attempting to write to a fd that is not ready for write causes conmon
to block on the write(2) call, at the same time the container process
might be blocked attempting to write to stderr but since conmon is not
processing any data from the container, both processes are dead
locked.

Keep in a buffer what we read from the pipe and write to the
masterfd_stdin only when it is ready for write.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@TomSweeneyRedHat
Copy link
Member

Other than @haircommander's nits and unhappy tests, LGTM

@cevich
Copy link
Member

cevich commented Mar 19, 2020

FYI: Cirrus-CI testing in GCP should un-break shortly (hours).

@giuseppe
Copy link
Member Author

tests are green

@haircommander haircommander merged commit 1b53637 into containers:master Mar 20, 2020
@rhatdan
Copy link
Member

rhatdan commented Mar 20, 2020

Need to cut a release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants