-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distsql: add missing MoveToDraining calls #30096
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glad you found it... definitely adding more weight to the argument that we should be moving the burden of understanding and correctly dealing with metadata to a layer above individual processors.
Reviewed 3 of 3 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained
pkg/sql/distsqlrun/sorter.go, line 557 at r1 (raw file):
s.trailingMeta = append(s.trailingMeta, *meta) if meta.Err != nil { return false, nil
MoveToDraining call missing here?
pkg/sql/distsqlrun/tablereader.go, line 247 at r1 (raw file):
if meta.Err != nil { tr.MoveToDraining(nil /* err */) break
Does this drop the metadata? If it does that would be hopefully caught by the metadata sender tests.
is there a way to test for this? |
It's hard to test for this condition, because it would mean that we would need to add unit tests for all error paths. I don't think that's necessarily infeasible, but it would be very difficult because the processor dependencies aren't interfaces and thus can't be substituted during testing for fake implementations that return errors. I think the right solution (that @asubiotto and @solongordon and I have been discussing) is to shift the responsibility of moving processors to the draining state from individual processor implementations to a wrapper that's in charge of various higher-level concerns like forwarding metadata, moving processors to draining on error, and other things that every processor really shouldn't have to reimplement. |
Yes, moving duplicate code to one place sounds like a fine plan. Thanks
…On Tue, 11 Sep, 2018, 11:25 PM Jordan Lewis, ***@***.***> wrote:
It's hard to test for this condition, because it would mean that we would
need to add unit tests for all error paths. I don't think that's
necessarily infeasible, but it would be very difficult because the
processor dependencies aren't interfaces and thus can't be substituted
during testing for fake implementations that return errors.
I think the right solution (that @asubiotto <https://github.com/asubiotto>
and @solongordon <https://github.com/solongordon> and I have been
discussing) is to shift the responsibility of moving processors to the
draining state from individual processor implementations to a wrapper
that's in charge of various higher-level concerns like forwarding metadata,
moving processors to draining on error, and other things that every
processor really shouldn't have to reimplement.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#30096 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ALOpBABZtVJgEiBMIBJWPU-SDkQsskXMks5uaH6YgaJpZM4WkKNM>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained
pkg/sql/distsqlrun/sorter.go, line 557 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
MoveToDraining call missing here?
It's handled in the caller.
pkg/sql/distsqlrun/tablereader.go, line 247 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Does this drop the metadata? If it does that would be hopefully caught by the metadata sender tests.
D'oh, looks that way. Thanks. (I don't think it would be caught by the tests because it's coming from the rowFetcherWrapper
.)
I found several places where processors were not checking for error metadata from their inputs and draining appropriately. The most egregious of these is in TableReader, which was not calling MoveToDraining when it encountered an error from the underlying RowFetcher. This meant that rather than draining it would continue calling NextRow and generating errors in a tight loop, which caused the query to hang as well as high CPU and memory usage on the affected node. Other affected processors are Distinct, SortChunks, and SortTopK. Fixes cockroachdb#29374 Fixes cockroachdb#29978 Release note: None
23ce7dc
to
2145501
Compare
bors r+ |
Canceled (will resume) |
30096: distsql: add missing MoveToDraining calls r=solongordon a=solongordon I found several places where processors were not checking for error metadata from their inputs and draining appropriately. The most egregious of these is in TableReader, which was not calling MoveToDraining when it encountered an error from the underlying RowFetcher. This meant that rather than draining it would continue calling NextRow and generating errors in a tight loop, which caused the query to hang as well as high CPU and memory usage on the affected node. Other affected processors are Distinct, SortChunks, and SortTopK. Fixes #29374 Fixes #29978 Release note: None Co-authored-by: Solon Gordon <solon@cockroachlabs.com>
Build succeeded |
I found several places where processors were not checking for error
metadata from their inputs and draining appropriately. The most
egregious of these is in TableReader, which was not calling
MoveToDraining when it encountered an error from the underlying
RowFetcher. This meant that rather than draining it would continue
calling NextRow and generating errors in a tight loop, which caused the
query to hang as well as high CPU and memory usage on the affected node.
Other affected processors are Distinct, SortChunks, and SortTopK.
Fixes #29374
Fixes #29978
Release note: None