Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: determine impact of sql.stats.automatic_collection.enabled cluster setting on sysbench oltp_read_write #135988

Open
tbg opened this issue Nov 22, 2024 · 12 comments
Assignees
Labels
branch-master Failures and bugs on the master branch. C-performance Perf of queries or internals. Solution not expected to change functional behavior. o-perf-efficiency Related to performance efficiency P-2 Issues/test failures with a fix SLA of 3 months

Comments

@tbg
Copy link
Member

tbg commented Nov 22, 2024

See here.

image

Epic: CRDB-42584

Jira issue: CRDB-44818

@tbg tbg added C-performance Perf of queries or internals. Solution not expected to change functional behavior. branch-master Failures and bugs on the master branch. T-sql-queries SQL Queries Team P-1 Issues/test failures with a fix SLA of 1 month o-perf-efficiency Related to performance efficiency labels Nov 22, 2024
Copy link

blathers-crl bot commented Nov 22, 2024

This issue has multiple T-eam labels. Please make sure it only has one, or else issue synchronization will not work correctly.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@michae2
Copy link
Collaborator

michae2 commented Nov 26, 2024

@tbg could you clarify whether this is about SQL table statistics, or observability statement statistics? They're both informally called "SQL stats" unfortunately, but they are not the same.

@michae2
Copy link
Collaborator

michae2 commented Nov 26, 2024

Oh, based on this thread, it seems to be the SQL statement stats owned by observability? Moving.

@michae2 michae2 added T-observability and removed T-sql-queries SQL Queries Team labels Nov 26, 2024
@michae2 michae2 removed this from SQL Queries Nov 26, 2024
@ajstorm
Copy link
Collaborator

ajstorm commented Nov 27, 2024

@tbg I opened #136324 for the specific issue of flushing SQL stats. Along with #135996, does that cover everything in this issue?

@tbg
Copy link
Member Author

tbg commented Nov 28, 2024

This issue came from turning off both sql.stats.automatic_collection and sql.stats.flush. Looks like you've got the latter covered in #136324. #135996 is about sql.metrics.statement_details.enabled. That means we're not yet covering sql.stats.automatic_collection. Your data here was inconclusive for this setting (it's unlikely that disabling it made anything slower, so I think we truly don't know the impact).

I think that setting has to do with full table scans that recreate table statistics, so it seems likely to have an impact:

// processing the current table longer than the refresh
// interval, look up the table descriptor to ensure we don't
// have stale table settings.
if elapsed > DefaultRefreshInterval || refreshingAllTables {
desc = r.getTableDescriptor(ctx, tableID)
if desc != nil {
if !r.autoStatsEnabled(desc) {
continue
}
if settingOverrides == nil {
settingOverrides = make(map[descpb.ID]catpb.AutoStatsSettings)
}
autoStatsSettings := desc.GetAutoStatsSettings()
if autoStatsSettings == nil {
delete(settingOverrides, tableID)
} else {
settingOverrides[tableID] = *autoStatsSettings

@mgartner could you chime in?

@ajstorm
Copy link
Collaborator

ajstorm commented Nov 28, 2024

I just kicked off a run where I'm testing this again on top of the 4 cluster settings which we know improve performance and stability. I'm hoping that this will give a more reliable result. Will report back when I have that data.

@ajstorm
Copy link
Collaborator

ajstorm commented Nov 29, 2024

In the repeated run, I'm still seeing a ~3% degradation with sql.stats.automatic_collection disabled. Is it possible that when we disable this setting, we're running with bad/no stats, and it's resulting in sub-optimal query plans?

@tbg
Copy link
Member Author

tbg commented Dec 2, 2024

Is it possible that when we disable this setting, we're running with bad/no stats, and it's resulting in sub-optimal query plans?

That could be possible. At the end of the runs with these settings, do we have any stats for these tables at all? When I disabled this, it was on a longer-running cluster. I think we do want stats, but measure the overhead of repeatedly recreating those stats if the underlying tables don't really change.

@xinhaoz xinhaoz assigned xinhaoz and unassigned xinhaoz Dec 3, 2024
@xinhaoz
Copy link
Member

xinhaoz commented Dec 3, 2024

Can I confirm that with the creation of the issues for flush and sql exec stats recording, this issue is only tracking the impact of sql.stats.automatic_collection.enabled? If so I'm going to transfer it to queries. Mistakenly assigned myself earlier since I thought it was around the obs issues (thanks for filing, Adam).

@yuzefovich yuzefovich added T-sql-queries SQL Queries Team and removed T-observability labels Dec 4, 2024
@github-project-automation github-project-automation bot moved this to Triage in SQL Queries Dec 4, 2024
@yuzefovich yuzefovich changed the title sql: sql stmt statistics jobs have massive negative impact on oltp_read_only sql: sql table statistics jobs have massive negative impact on oltp_read_only Dec 4, 2024
@tbg tbg changed the title sql: sql table statistics jobs have massive negative impact on oltp_read_only sql: determine impact of sql.stats.automatic_collection.enabled cluster setting on sysbench oltp_read_write Dec 4, 2024
@tbg tbg added P-2 Issues/test failures with a fix SLA of 3 months and removed P-1 Issues/test failures with a fix SLA of 1 month labels Dec 4, 2024
@yuzefovich yuzefovich removed the T-sql-queries SQL Queries Team label Dec 10, 2024
@yuzefovich yuzefovich removed this from SQL Queries Dec 10, 2024
@yuzefovich
Copy link
Member

[queries triage] if we understand correctly, right now it's unclear whether running auto stats has detrimental effect on the sysbench numbers overall, and as such we removed this from out project board. If there is detrimental effect, and Queries needs to look closer, please add it back to our project.

@michae2
Copy link
Collaborator

michae2 commented Dec 10, 2024

Note that the sql.stats.automatic_collection.max_fraction_idle cluster setting can be used to tune the throttling of automatic table statistics collection. Quoting from this slack conversation:

The throttling only comes into play when CPU usage > 25%. 0 = no throttling of stats collection, 1 = max throttling of stats collection. Default is 0.9.

@ajstorm
Copy link
Collaborator

ajstorm commented Dec 19, 2024

I still need to dig further into this one, but my theory is that auto stats is actually helpful to these runs because we're not manually collecting stats before the run. That being said, histogram collection seems to harm the runs because (again a guess) the histogram stats aren't helpful for this workload.

Will need to do more digging on this in the new year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-performance Perf of queries or internals. Solution not expected to change functional behavior. o-perf-efficiency Related to performance efficiency P-2 Issues/test failures with a fix SLA of 3 months
Projects
None yet
Development

No branches or pull requests

6 participants