Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

master: Update pkg/testutils/release/cockroach_releases.yaml #5

Open
wants to merge 87 commits into
base: master
Choose a base branch
from

Conversation

github-actions[bot]
Copy link

@github-actions github-actions bot commented Sep 8, 2023

Update pkg/testutils/release/cockroach_releases.yaml with recent values.

Epic: None
Release note: None

@github-actions github-actions bot force-pushed the crdb-releases-yaml-update-master branch from 5decaa5 to d1ebbb3 Compare September 16, 2023 00:54
herkolategan and others added 29 commits September 27, 2023 11:40
Previously the `lenient` flag that allowed errors during microbenchmarks to be
tolerated would also result in the exit status being 0 even if errors occurred.

The error tolerance should only allow the run to continue, if errors are
encountered, but still report the failures by signalling an exit code 1 so that
failures can be tracked and reported on.

Release Note: None
Epic: None
This PR fixes the test scripts used by developers to quickly setup a
multitenant test environment. The changes in cockroachdb since these
were created broke them.

Epic: none

Release note: None
Add `--flaky_test_attempts=4` to the coverage unit test builds. We
don't want flaky tests failing these builds often.

Epic: none
Release note: None
To make it easier to identify the metric being used to generate
charts, this commit adds the metric to the tooltip of all charts
on the Metrics page.

Fixes cockroachdb#109277

This also fix the metric name for `Schema Registry Registrations`.

Fixes cockroachdb#108095

Release note (ui change): On the Metric page, now the information
about which metric is used to create each chart is available on the chart's
tooltip.

Release note (bug fix): Fix metric name for `Schema Registry Registrations`.
We recently introduced metrics into the logging package,
but unfortunately we did not prefix them properly.

All metrics in the logging package should share the same
`log.*` prefix, to clearly group them together.

Luckily, only 1 log metric exists so far. This patch
updates the metric name to have the `log.*` prefix.

Release note (ops change): This patch renames the metric
`fluent.sink.conn.errors` to `log.fluent.sink.conn.errors`.

The addition of the `log.` prefix was to better group
together logging-related metrics. The behavior and purpose
of the metric remains unchanged.
The Measurement metadata for this metric was incorrect.

This patch fixes it to better represent what's being
measured.

Release note: none
The log.fluent.sink.conn.errors metric's metadata
was missing the MetricType. This patch adds it.

Release note (ops change): This patch sets the Metric Type
on the metric `log.fluent.sink.conn.errors`. Previously, the
Metric Type was incorrectly left unset.

Note that this is simply an update to the metric's metadata.
The behavior and purpose of the metric remains unchanged.
Previously, a LogMetrics implementations was not provided
to the logging package in tests. This could lead to tests
that exercise code paths involving LogMetrics to experience
problems like nil pointer errors.

This patch assigns a dummy test implementation in the
testing log scope setup, to avoid this case.

Release note: none
Previously, the MetricsStruct used by the logmetrics
package was protected by the same mutex that protects
the map of metric name to counter.

However, the MetricsStruct is never written to after
initialization. It's only read within `NewRegistry()`
to dump the underlying counters into a new registry
for in-process tenants. Since concurrent writes are
not possible with this MetricsStruct (it's only read
from), protection by this mutex is unnecessary.

In fact, the unnecessary mutex protection can cause
a deadlock if a metric is incremented in the hot path
for logging (e.g. something like once per-log message
as it passes through `outputLogEntry`).

For example:
1. NewRegistry is called
2. NewRegistry acquires and holds mutex
3. NewRegistry initializes a new registry, which
eventually [makes a logging call](https://github.com/cockroachdb/cockroach/blob/master/pkg/util/metric/registry.go#L87)
4. Logging call makes its way through the logging
code and attempts to increment a logmetrics counter.
5. `IncrementCounter` is called.
6. `IncrementCounter` attempts to acquire the mutex.
7. The mutex is already being held via step 2.
8. Deadlock!

By removing the unnecessary protection of the mutex
for the MetricsStruct, we eliminate this possibility.

Release note: none
Buffered network logging sinks have a `max-buffer-size` attribute,
which determines, in bytes, how many log messages can be buffered.

If a writer attempts to append a log message to the buffer that
would exceed this `max-buffer-size`, then the buffered log sink
logic drops older messages to make room for the new.

Previously, these dropped messages were not tracked in any way.
A TODO was left to add a metric tracking them.

This patch introduces a metric to do so:
`log.buffered.messages.dropped`

It's shared across all buffered log sinks and counts the number
of messages dropped from the buffer.

Release note (ops change): This patch introduces a new metric,
`log.buffered.messages.dropped`.

Buffered network logging sinks have a `max-buffer-size` attribute,
which determines, in bytes, how many log messages can be buffered.
Any `fluent-server` or `http-server` log sink that makes use of
a `buffering` attribute in its configuration (enabled by default)
qualifies as a buffered network logging sink.

If this buffer becomes full, and an additional log message is sent
to the buffered log sink, the buffer would exceed this
`max-buffer-size`. Therefore, the buffered log sink drops older
messages in the buffer to handle, in order to make room for the new.

`log.buffered.messages.dropped` counts the number of messages
dropped from the buffer. Note that the count is shared across all
buffered logging sinks.
Release note: None
Prior to this patch, when a virtual cluster was created without a
name, a default name was generated with structure `tenant-NNN`. To
avoid emphasizing multi-tenancy, this commit changes this to
`cluster-NNN`.

(No release note because there is no user-facing way to create a
record without a name.)

Release note: None
Multiple people have seen this timeout for `race`. Let's bump this
timeout only for `race`.

Epic: none
Release note: None
This change adds a cluster setting,
`kv.snapshot_receiver.excise.enabled`, to use IngestAndExcise
for the replicated/user-key portion of a replica's contents
instead of rangedels. This reduces write-amp as
rangedels/rangekeydels have to be compacted while an excise
shrinks sstables into virtual sstables to clear out contents
of a replica immediately. At the moment, this is an experimental
feature and should be used with caution.

Epic: none

Release note: None
111576: sql: use 'cluster-NNN' for virtual cluster records without a name r=stevendanna a=knz

Epic: CRDB-29380

Prior to this patch, when a virtual cluster was created without a name, a default name was generated with structure `tenant-NNN`. To avoid emphasizing multi-tenancy, this commit changes this to `cluster-NNN`.

(No release note because there is no user-facing way to create a record without a name.)

Release note: None

111586: settings: more guidance r=dt a=knz

Epic: CRDB-6671

As requested by `@dt` [here](cockroachdb#111579 (comment)).

Release note: None

111587: configprofiles: more clamping down on spurious slice overwrites r=yuzefovich a=knz

Epic: CRDB-26691

Suggested by `@yuzefovich` [here](cockroachdb#111569 (review)).


Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
When there are multiple shared locks on a key, any active waiters will
push the first of the lock holders (aka the claimant). Previously, when
the claimaint was finalized, we weren't recomputing the waiting state
for any active waiters to push the new claimaint. As a result, in such
a scenario, waiters would end up blocking indefinitely without pushing.

This is non-ideal, as it means we're not going to be running
deadlock/liveness detection. Waiters would hang indefinitely if there
was a deadlock/liveness issue. This patch fixes this behaviour by
recomputing new waiting state in cases where a shared lock is released
but the key isn't unlocked.

Epic: none

Release note: None
This patch introduces the `log.messages.count` metric.

The metric counts the number of messages logged, recording
at the point of `outputLogEntry`, which all logging calls
(e.g. `Info`, `Error`, etc.) commonly pass through.

This metric will be helpful to better understand log
volume and rates.

Note that this does not capture the fanout of a single
log message to multiple logging sinks.

Release note (ops change): This patch introduces the metric,
`log.messages.count`.

This metric measures the count of messages logged on the
node since startup. Note that this does not measure the
fan-out of single log messages to the various configured
logging sinks.

This metric can be helpful in understanding log rates and
volumes.
This is a small optimization made to the logmetrics package.
The log package previously provided a metric name string when
incrementing a metric, which would prompt the logmetrics package
to perform a map lookup. By using enum values instead, we can
do direct index lookups instead.

These log metrics are in the critical logging path, so these
types of optimizations are worthwhile, especially when the
effort is low (like here).

Release note: none
This commit adds a program that takes an output directory path, collect
all statements in all logic tests, and write them, per file, to the
provided output directory.

Release note: None
This commits adds a nightly task in TC that collects statements in all
logic tests and store them in google cloud under
`cockroach-corpus/logictest-stmts-corpus/`.

Release note: None
111571: tests: silence some warnings r=yuzefovich a=knz

This will improve investigations for failures like cockroachdb#111541.

Epic: CRDB-18499.

111590: github-pull-request-make: longer overall timeout for `stressrace` r=jlinder a=rickystewart

Multiple people have seen this timeout for `race`. Let's bump this timeout only for `race`.

Epic: none
Release note: None

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
For some reason after an update GoLand stopped compiling because of
this.

Epic: None

Release note: None
Previously, StartSharedProcessTenant() would hang if it were run on a tenant
that was created by a replication stream. This patch fixes this bug by ensuring
`ALTER TENANT $1 START SERVICE SHARED` is run even if the tenant was already
created.

Epic: none

Release note: None
erikgrinaker and others added 29 commits October 3, 2023 11:22
111613: sqlstats: fix counter for in-memory fingerprints r=j82w a=j82w

Problem:
The counters used to track the number of unique fingerprints we store in-memory for sql stats were refactored in cockroachdb#110805. In change cockroachdb#110805 a bug was introduced where it incresease the memory instead of resetting the counts. This causes the statstics to stop calculating new stats once the limit is hit.

Solution:
Fix the bug by resetting the counters instead of increasing them. Added new test to test the reset functionality.

Fixes: cockroachdb#111583

Release note (sql change): Fix a bug that causes the sql stats to stop collecting new stats.

Co-authored-by: j82w <jwilley@cockroachlabs.com>
This renames the setting `kv.raft_log.synchronization.disabled` to
`kv.raft_log.synchronization.unsafe.disabled` as per naming
guidelines, and marks it as unsafe explicitly.

Release note: None
Prior to this patch, it was possible to easily automate `SET CLUSTER
SETTING` for unsafe cluster settings. This is undesirable; we want to
strongly incentivize a human operator paying attention to changes to
these settings.

This patch implements an *interlock*: a mechanism through which the
operator needs to perform two concurrent, related actions for the
change to take effect.

This works as follows:

1. the operator attempts to change a cluster setting from a SQL shell,
   for example:

   ```sql
   SET CLUSTER SETTING kv.raft_log.synchronization.unsafe.disabled = true;
   ```

2. the server fails the execution, with an error:

   ```
   ERROR: changing cluster setting
   "kv.raft_log.synchronization.unsafe.disabled" may cause cluster
   instability or data corruption. To confirm the change, run the
   following command before trying again:

   SET unsafe_setting_interlock_key = 'B7TxIA==';

   ```

3. the operator can then perform the recommended action, then
   try SET CLUSTER SETTING again. Because the key is properly
   set, the SET CLUSTER SETTING statement succeeds.

Also, `RESET` statements (or `SET CLUSTER SETTING ... = DEFAULT`) are
not subject to the interlock, as we assume that the default value is
safe for use.

(No release note because the only unsafe settings as of this writing
are not documented to end-users.)

Release note: None
109801: sql: implement an interlock to modify unsafe settings r=dt a=knz

Fixes cockroachdb#109810.
Epic: CRDB-28893

As discussed [here](https://docs.google.com/document/d/11mWsfORExZxKqyMJfa6vg7LUzLEYhJ295NkaP1-bvL4/edit?disco=AAAA3lp44WY).

Prior to this patch, it was possible to easily automate `SET CLUSTER
SETTING` for unsafe cluster settings. This is undesirable; we want to
strongly incentivize a human operator paying attention to changes to
these settings.

This patch implements an *interlock*: a mechanism through which the
operator needs to perform two concurrent, related actions for the
change to take effect.

This works as follows:

1. the operator attempts to change a cluster setting from a SQL shell,
   for example:

   ```sql
   SET CLUSTER SETTING kv.raft_log.synchronization.unsafe.disabled = true;
   ```

2. the server fails the execution, with an error:

   ```
   ERROR: changing cluster setting
   "kv.raft_log.synchronization.unsafe.disabled" may cause cluster
   instability or data corruption. To confirm the change, run the
   following command before trying again:

   SET unsafe_setting_interlock_key = 'B7TxIA==';

   ```

3. the operator can then perform the recommended action, then
   try SET CLUSTER SETTING again. Because the key is properly
   set, the SET CLUSTER SETTING statement succeeds.

Also, `RESET` statements (or `SET CLUSTER SETTING ... = DEFAULT`) are
not subject to the interlock, as we assume that the default value is
safe for use.

111336: roachprod-microbench: update error tolerance r=renatolabs,srosenberg a=herkolategan

Previously the `lenient` flag that allowed errors during microbenchmarks to be tolerated would also result in the exit status being 0 even if errors occurred.

The error tolerance should only allow the run to continue, if errors are encountered, but still report the failures by signalling an exit code 1 so that failures can be tracked and reported on.

Release Note: None
Epic: None

111639: kvserver: skip `TestStoreRangeMergeRaftSnapshot` under metamorphic tests r=erikgrinaker a=erikgrinaker

Touches cockroachdb#111624.
Epic: none
Release note: None

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: Herko Lategan <herko@cockroachlabs.com>
Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
111594: sql-telemetry: query_sampling.max_event_frequency public r=maryliag a=emilaleksanteri

Epic: none
Fixes: cockroachdb#108385

Release note (sql change): make max_event_frequency public for public documentation

111638: kvfollowerreadsccl: use `SystemVisible` for `kv.closed_timestamp.propagation_slack` r=erikgrinaker a=erikgrinaker

**changefeedccl: don't use `ALTER TENANT ALL` for closed timestamp setting**

This is no longer necessary with the `SystemVisible` class.

**kvfollowerreadsccl: use `SystemVisible` for `kv.closed_timestamp.propagation_slack`**

It doesn't make any sense to configure this individually per tenant.

Epic: none
Release note: None


Co-authored-by: Emil Lystimaki <emil@circularway.com>
Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
Epic: none

Release note: None
This test is currently only used for manual benchmarking. It will
revisited later as part of
cockroachdb#111614.

We should skip it to avoid noise in test failures.

Fixes: cockroachdb#111542.

Release note: None
Fix test to use correct config for injecting invalid lease indexes.

Epic: none

Release note: None
111645: roachtest: show running test in teamcity logs r=smg260 a=smg260

In the TC log, we currently show when a test has finished. Now that stderr/out has been cleaned up, it would be useful to also show when a test has begun running. We already do this in GCE (with a grafana link). 

Epic: none

Release note: None

Co-authored-by: Miral Gadani <miral@cockroachlabs.com>
Epic: none This change pins `pnpm` to `8.6.10` for the cluster-ui
release (and release-next) workflow(s) to prevent not up-to-date
lockfiles when installing cluster-ui dependencies with pnpm.

Release note: None
111584: roachtest: add ruby-pg test to ignorelist r=rafiss a=rafiss

fixes cockroachdb#111522
fixes cockroachdb#111508

Release note: None

111588: build: remove uses of `bindata` r=rail,srosenberg a=rickystewart

This is deprecated in `rules_go`, and `go:embed` has the same functionality.

Closes cockroachdb#111520.
Epic: CRDB-8308
Release note: None

Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
111655: kvnemesis: use correct probability for invalid lease r=pavelkalinnikov a=aliher1911

Fix test to use correct config for injecting invalid lease indexes.

Epic: none

Release note: None

Co-authored-by: Oleg Afanasyev <oleg@cockroachlabs.com>
Add the ability to choose a color for specific metric series on
Metric charts.

On chart Replication -> Ranges, specify the color so it will be red.
Otherwise could be confusing seeing any other as Red, and
the `Unavailable` as green.

Fixes cockroachdb#107637

Release note: None
…ings

Fixes cockroachdb#111626

The previous impl assumed input string length <= math.MaxInt32. Go 1.20 added
unsafe.StringData (https://pkg.go.dev/unsafe#StringData) which properly handles
longer strings. This changes the impl to use unsafe.StringData and adds a unit
test.

Release note (bug fix): Fixed a panic that could occur if a query uses a string
larger than 2^31-1 bytes.
When creating a new cluster, this moves the initialisation
of the log with retry number to the top of the loop,
so that we can pass in the log reference to the `clusterImpl`.

Without this, new clusters are susceptible to a nil pointer.

This surfaced when testing Azure cloud, since not all provider
functions are implemented, and a log statement is issued.

Epic: none
Release note: none
111615: roachtest: skip admission-control/index-backfill from weekly runs r=sumeerbhola a=aadityasondhi

This test is currently only used for manual benchmarking. It will revisited later as part of
cockroachdb#111614.

We should skip it to avoid noise in test failures.

Fixes: cockroachdb#111542.

Release note: None

Co-authored-by: Aaditya Sondhi <20070511+aadityasondhi@users.noreply.github.com>
110943: kvserver,storage: ingest small snapshot as writes r=itsbilal,erikgrinaker a=sumeerbhola

Small snapshots cause LSM overload by resulting in many tiny memtable flushes, which result in high sub-level count, which then needs to be compensated by running many inefficient compactions from L0 to Lbase. Despite some compaction scoring changes, we have not been able to fully eliminate impact of this in foreground traffic as discussed in cockroachdb/pebble#2832 (comment).

Fixes cockroachdb#109808

Epic: none

Release note (ops change): The cluster setting
kv.snapshot.ingest_as_write_threshold controls the size threshold below which snapshots are converted to regular writes. It defaults to 100KiB.

111627: encoding: fix UnsafeConvertStringToBytes to work with large input strings r=ecwall a=ecwall

Fixes cockroachdb#111626

The previous impl assumed input string length <= math.MaxInt32. Go 1.20 added unsafe.StringData (https://pkg.go.dev/unsafe#StringData) which properly handles longer strings. This changes the impl to use unsafe.StringData and adds a unit test.

Release note (bug fix): Fixed a panic that could occur if a query uses a string
larger than 2^31-1 bytes.

111656: cluster-ui: pin `pnpm` to `8.6.10` for cluster-ui-release workflow r=THardy98 a=THardy98

Epic: none
This change pins `pnpm` to `8.6.10` for the cluster-ui release (and release-next) workflow(s) to prevent not up-to-date lockfiles when installing cluster-ui dependencies with pnpm.

Release note: None

Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
Co-authored-by: Evan Wall <wall@cockroachlabs.com>
Co-authored-by: Thomas Hardy <thardy@cockroachlabs.com>
111467: ui: allow custom color on metric r=maryliag a=maryliag

Add the ability to choose a color for specific metric series on
Metric charts.

On chart Replication -> Ranges, specify the color so it will be red.
Otherwise could be confusing seeing any other as Red, and
the `Unavailable` as green.

Fixes cockroachdb#107637

Release note: None

Before
<img width="857" alt="Screenshot 2023-09-28 at 7 49 55 PM" src="https://github.com/cockroachdb/cockroach/assets/1017486/88af7e07-3c58-463d-963c-d47a3dd3f7c3">


After
<img width="897" alt="Screenshot 2023-10-03 at 12 27 00 PM" src="https://github.com/cockroachdb/cockroach/assets/1017486/cfe3c2f8-4038-412f-908c-3ca51a82d720">


Release note: None

111598: sql: support SHOW GRANTS ON PROCEDURE r=mgartner a=mgartner

Epic: CRDB-25388

Release note: None

111642: kvserver: use `SystemVisible` for `kv.raft.command.max_size` r=erikgrinaker a=erikgrinaker

Epic: none
Release note: None

Co-authored-by: maryliag <marylia@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
111535: build: coverage: retry flaky unit tests r=RaduBerinde a=RaduBerinde

Add `--flaky_test_attempts=4` to the coverage unit test builds. We
don't want flaky tests failing these builds often.

Epic: none
Release note: None

111633: roachtest: revert harmonize GCE and AWS machine types r=RaduBerinde,erikgrinaker a=srosenberg

Revert the change to machine types in [1] until
after 23.2 branch is cut.

[1] cockroachdb#111140

Epic: none

Release note: None

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Co-authored-by: Stan Rosenberg <stan.rosenberg@gmail.com>
The test regularly took about 7m under race. This commit drops down the
size of the test under race so that it runs in about the same time as the
non-race test.

Epic: None
Release note: None
111680: kv: speed up `TestNewVsInvariants` under race r=nvanbenschoten a=nvanbenschoten

The test regularly took about 7m under race. This commit drops down the size of the test under race so that it runs in about the same time as the non-race test.

Epic: None
Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
The stats in TeamCity show that the time it takes to run all the tests
in this package can frequently get very close to the existing timeout.

Release note: None
111519: sqlproxyccl: fix test scripts r=darinpp a=darinpp

This PR fixes the test scripts used by developers to quickly setup a multitenant test environment. The changes in cockroachdb since these were created broke them.

Epic: none

Release note: None

111669: roachtest: avoid nil logger r=renatolabs,srosenberg a=smg260

When creating a new cluster, this moves the initialisation of the log with retry number to the top of the loop, so that we can pass in the log reference to the `clusterImpl`.

Without this, new clusters are susceptible to a nil pointer.

This surfaced when testing Azure cloud, since not all provider functions are implemented, and a log statement is issued.

Epic: none
Release note: none

111682: ttljob: increase test timeout r=rafiss a=rafiss

The stats in TeamCity show that the time it takes to run all the tests in this package can frequently get very close to the existing timeout.

fixes cockroachdb#111364
Release note: None

Co-authored-by: Darin Peshev <darinp@gmail.com>
Co-authored-by: Miral Gadani <miral@cockroachlabs.com>
Co-authored-by: Miral Gadani <25202158+smg260@users.noreply.github.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Update pkg/testutils/release/cockroach_releases.yaml with recent values.

Epic: None
Release note: None
@cameronnunez cameronnunez force-pushed the crdb-releases-yaml-update-master branch from d1ebbb3 to 8013a97 Compare October 4, 2023 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.