-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-23.2: roachprod: split start-up script logic for disks #130119
release-23.2: roachprod: split start-up script logic for disks #130119
Conversation
Previously, the start-up script for GCE VMs would determine if it should run by looking at `/mnt/data1/.roachprod-initialized`. This logic fails if the disks are ephemeral. The VM also completely fails to boot due to an `fstab` entry that does not have the `nofail` option. Because, a new disk is attached, but it's not formatted. This change adds the `nofail` option, but also divides the start-up script into parts to handle the scenario where the OS might have been previously initialed, but the disks are not initialised. To fix this we now track OS initialization through `/.roachprod-initialized` and disk initialization via `/mnt/data1/.roachprod-initialized`. We also prevent the `fstab` entries from being written again, as these are part of the OS initialization. Fixes: cockroachdb#122094 Epic: None Release Note: None
Thanks for opening a backport. Please check the backport criteria before merging:
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
Also, please add a brief release justification to the body of your PR to justify this |
503167b
to
abe6e00
Compare
AttachedDiskLun *int // Use attached disk, with specified LUN; Use local ssd if nil. | ||
// TODO(DarrylWong): In the future, when all tests are run on Ubuntu 22.04, we can remove this check and default true. | ||
// See: https://github.com/cockroachdb/cockroach/issues/112112 | ||
IsUbuntu22 bool // Allow RSA SHA1 to be used and create tcpdump symlink. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still present in the startup script, should we remove it / will it cause issues?
See below:
{{ if .IsUbuntu22 }}
sudo sh -c 'echo "PubkeyAcceptedAlgorithms +ssh-rsa" >> /etc/ssh/sshd_config'
{{ end }}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I wasn't entirely sure if we had any roachtests on this branch using an older version of Ubuntu. Indeed, we do,
pkg/cmd/roachtest/tests/ruby_pg.go: Cluster: r.MakeClusterSpec(1, spec.UbuntuVersion(vm.FocalFossa)),
pkg/cmd/roachtest/tests/disk_stall.go: Cluster: r.MakeClusterSpec(4, spec.ReuseNone(), spec.DisableLocalSSD(), spec.UbuntuVersion(vm.FocalFossa)),
Reverting that change.
abe6e00
to
5800da3
Compare
Previously, `Wipe` would delete the `roachprod` marker file `.roachprod-initialized` from '/mnt/data1', that is created by the start-up script. This file is required by the `Wait` operation, which is required for `SetupSSH` and others to function correctly. Hence, after running Wipe on a cluster it becomes problematic to do some operations. This change exclude the file from deletion. Epic: None Release Note: None
5800da3
to
d3bc2ef
Compare
TFTR! |
fix for backport: cockroachdb#130119 /cc @cockroachdb/release Release justification: test-only changes
Backport:
.roachprod-initialized
from wipe" (roachprod: exclude.roachprod-initialized
from wipe #122522)Please see individual PRs for details.
/cc @cockroachdb/release
Release justification: test-only changes