Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distsql: add missing MoveToDraining calls #30096

Merged
merged 1 commit into from
Sep 12, 2018

Conversation

solongordon
Copy link
Contributor

I found several places where processors were not checking for error
metadata from their inputs and draining appropriately. The most
egregious of these is in TableReader, which was not calling
MoveToDraining when it encountered an error from the underlying
RowFetcher. This meant that rather than draining it would continue
calling NextRow and generating errors in a tight loop, which caused the
query to hang as well as high CPU and memory usage on the affected node.

Other affected processors are Distinct, SortChunks, and SortTopK.

Fixes #29374
Fixes #29978

Release note: None

@solongordon solongordon requested review from jordanlewis, asubiotto and a team September 11, 2018 20:21
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: glad you found it... definitely adding more weight to the argument that we should be moving the burden of understanding and correctly dealing with metadata to a layer above individual processors.

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/sql/distsqlrun/sorter.go, line 557 at r1 (raw file):

			s.trailingMeta = append(s.trailingMeta, *meta)
			if meta.Err != nil {
				return false, nil

MoveToDraining call missing here?


pkg/sql/distsqlrun/tablereader.go, line 247 at r1 (raw file):

			if meta.Err != nil {
				tr.MoveToDraining(nil /* err */)
				break

Does this drop the metadata? If it does that would be hopefully caught by the metadata sender tests.

@vivekmenezes
Copy link
Contributor

is there a way to test for this?

@jordanlewis
Copy link
Member

It's hard to test for this condition, because it would mean that we would need to add unit tests for all error paths. I don't think that's necessarily infeasible, but it would be very difficult because the processor dependencies aren't interfaces and thus can't be substituted during testing for fake implementations that return errors.

I think the right solution (that @asubiotto and @solongordon and I have been discussing) is to shift the responsibility of moving processors to the draining state from individual processor implementations to a wrapper that's in charge of various higher-level concerns like forwarding metadata, moving processors to draining on error, and other things that every processor really shouldn't have to reimplement.

@vivekmenezes
Copy link
Contributor

vivekmenezes commented Sep 12, 2018 via email

Copy link
Contributor Author

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/sql/distsqlrun/sorter.go, line 557 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

MoveToDraining call missing here?

It's handled in the caller.


pkg/sql/distsqlrun/tablereader.go, line 247 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Does this drop the metadata? If it does that would be hopefully caught by the metadata sender tests.

D'oh, looks that way. Thanks. (I don't think it would be caught by the tests because it's coming from the rowFetcherWrapper.)

I found several places where processors were not checking for error
metadata from their inputs and draining appropriately. The most
egregious of these is in TableReader, which was not calling
MoveToDraining when it encountered an error from the underlying
RowFetcher. This meant that rather than draining it would continue
calling NextRow and generating errors in a tight loop, which caused the
query to hang as well as high CPU and memory usage on the affected node.

Other affected processors are Distinct, SortChunks, and SortTopK.

Fixes cockroachdb#29374
Fixes cockroachdb#29978

Release note: None
@solongordon
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 12, 2018

Canceled (will resume)

craig bot pushed a commit that referenced this pull request Sep 12, 2018
30096: distsql: add missing MoveToDraining calls r=solongordon a=solongordon

I found several places where processors were not checking for error
metadata from their inputs and draining appropriately. The most
egregious of these is in TableReader, which was not calling
MoveToDraining when it encountered an error from the underlying
RowFetcher. This meant that rather than draining it would continue
calling NextRow and generating errors in a tight loop, which caused the
query to hang as well as high CPU and memory usage on the affected node.

Other affected processors are Distinct, SortChunks, and SortTopK.

Fixes #29374
Fixes #29978

Release note: None

Co-authored-by: Solon Gordon <solon@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Sep 12, 2018

Build succeeded

@craig craig bot merged commit 2145501 into cockroachdb:master Sep 12, 2018
@solongordon solongordon deleted the drain-on-err branch September 13, 2018 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants