Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: use the rate from the log files #130107

Merged
merged 6 commits into from
Sep 14, 2024

Conversation

andrewbaptist
Copy link
Collaborator

@andrewbaptist andrewbaptist commented Sep 4, 2024

Previously we would track the throughput and latency from the listener
on the nodes directly. This had problems in certain cases and added a
new failure mode. This change switches to pulling the rate from the log
files after the test completes.

Fixes: #129618
Fixes: #129113

Epic: none

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andrewbaptist andrewbaptist changed the title 2024 09 04 perturbation data roachtest: use the rate from the log files Sep 4, 2024
@andrewbaptist andrewbaptist force-pushed the 2024-09-04-perturbation-data branch 5 times, most recently from 2c2ad7a to f49dc2a Compare September 6, 2024 01:06
@andrewbaptist andrewbaptist marked this pull request as ready for review September 6, 2024 01:25
@andrewbaptist andrewbaptist requested a review from a team as a code owner September 6, 2024 01:25
@andrewbaptist andrewbaptist requested review from nameisbhaskar and DarrylWong and removed request for a team September 6, 2024 01:25
@andrewbaptist andrewbaptist force-pushed the 2024-09-04-perturbation-data branch 2 times, most recently from 8868d3c to 229219f Compare September 6, 2024 18:44
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. It would help understand why we're making these changes if there were an example you could share on the PR, which motivates switching from the earlier workload recording to this.

Reviewed 2 of 2 files at r1, 1 of 1 files at r2, 2 of 2 files at r3, 3 of 3 files at r4, 1 of 1 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @DarrylWong, and @nameisbhaskar)


-- commits line 12 at r2:
For what test? This could apply to many of the hundreds of roachtests under roachtest


-- commits line 24 at r3:
Same as above, although the change is in a util file, the only user is the admission_control_latency test.


-- commits line 35 at r4:
Same as above.


-- commits line 37 at r4:
nit: Previously


-- commits line 49 at r5:
Which test?


-- commits line 58 at r6:
For which test? This isn't done generally for all roachtests.


pkg/cmd/roachtest/tests/admission_control_latency.go line 578 at r2 (raw file):

	clusterMaxRate := make(chan int)
	m.Go(func(ctx context.Context) error {
		waitDuration(ctx, 3*v.fillDuration/4)

Could you add a comment explaining the variables? I understand the /4 but not the 3 *.


pkg/cmd/roachtest/tests/admission_control_latency.go line 685 at r4 (raw file):

	t.Status("T5: validating results")

	// collect files

nit: for this and the below comment, use proper sentences.


pkg/cmd/roachtest/tests/admission_control_latency.go line 73 at r6 (raw file):

// TODO(baptist): Add a timeline describing the test in more detail.
type variations struct {
	// cluster is set up at the start of the test run

nit: period.


pkg/cmd/roachtest/tests/admission_control_latency.go line 74 at r6 (raw file):

type variations struct {
	// cluster is set up at the start of the test run
	cluster.Cluster

nit: I'm not sure how much value this gets us, none of the call lines are that long or tedious that it makes sense. I'm fine merging this though.

Previously there was code to output JSON but no parsing utility. This
commit adds parsing.

Epic: cockroachdb#129671

Release note: None
Copy link
Collaborator Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR

I cleaned up the commit messages and some of the logs. I also added the two tests that this fixes.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @kvoli, and @nameisbhaskar)


-- commits line 12 at r2:

Previously, kvoli (Austen) wrote…

For what test? This could apply to many of the hundreds of roachtests under roachtest

I added perturbation/* to the commit message


-- commits line 24 at r3:

Previously, kvoli (Austen) wrote…

Same as above, although the change is in a util file, the only user is the admission_control_latency test.

I clarified that this is only used in perturbation tests today, but will likely be used in other tests in the future.


-- commits line 35 at r4:

Previously, kvoli (Austen) wrote…

Same as above.

Fixed


-- commits line 37 at r4:

Previously, kvoli (Austen) wrote…

nit: Previously

Fixed


-- commits line 49 at r5:

Previously, kvoli (Austen) wrote…

Which test?

Added


-- commits line 58 at r6:

Previously, kvoli (Austen) wrote…

For which test? This isn't done generally for all roachtests.

Added


pkg/cmd/roachtest/tests/admission_control_latency.go line 578 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Could you add a comment explaining the variables? I understand the /4 but not the 3 *.

I added a comment and made it more clear that this is 3/4 of the time waiting and 1/4 collecting stats.


pkg/cmd/roachtest/tests/admission_control_latency.go line 685 at r4 (raw file):

Previously, kvoli (Austen) wrote…

nit: for this and the below comment, use proper sentences.

I removed these comments, these were placeholders for me initally that I left in. They weren't necessary based on the context.


pkg/cmd/roachtest/tests/admission_control_latency.go line 73 at r6 (raw file):

Previously, kvoli (Austen) wrote…

nit: period.

Done


pkg/cmd/roachtest/tests/admission_control_latency.go line 74 at r6 (raw file):

Previously, kvoli (Austen) wrote…

nit: I'm not sure how much value this gets us, none of the call lines are that long or tedious that it makes sense. I'm fine merging this though.

It was slightly annoying to have to keep referring to the cluster. It was a fairly minor change though, so I'll leave it.

Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r7, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist, @DarrylWong, and @nameisbhaskar)


-- commits line 12 at r2:

Previously, andrewbaptist (Andrew Baptist) wrote…

I added perturbation/* to the commit message

Could you also add something to the commit header, like:

roachtest: query throughput using SQL for pertubation tests

Previously the rate for the cluster was collected from the workload
nodes. In an effort to switch away from the dependence on the roachtest
node in the critical path, this commit changes all the perturbation/*
tests to use the rate from the cluster nodes directly.

Epic: cockroachdb#129671

Release note: None
Previously the latency for profileTopStatements needed to be manually
passed in. This commit changes it to detect automatically based on 10x
the P99 latency. This is only used in the perturbation/* tests today.

Epic: cockroachdb#129671

Release note: None
Previously we would track the throughput and latency from the listener
on the nodes directly. This had problems in certain cases and added a
new failure mode. This change switches to pulling the rate from the log
files after the test completes.

Epic: cockroachdb#129671

Release note: None
Previously the perturbation/* tests did not check for errors in the
workers it launched. This made it hard to track down bugs or problems
with running the workload.

Epic: cockroachdb#129671

Release note: None
Having the struct embedded makes it easier to use.

Epic: none

Release note: None
Copy link
Collaborator Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DarrylWong, @kvoli, and @nameisbhaskar)


-- commits line 12 at r2:

Previously, kvoli (Austen) wrote…

Could you also add something to the commit header, like:

roachtest: query throughput using SQL for pertubation tests

Done

@andrewbaptist
Copy link
Collaborator Author

TFTR!

bors r=kvoli

@craig craig bot merged commit 99b7da5 into cockroachdb:master Sep 14, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: perturbation/metamorphic/partition failed roachtest: perturbation/metamorphic/decommission failed
3 participants