Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scylla Manager regression in 3.3.2: segmentation fault #4028

Closed
rzetelskik opened this issue Sep 9, 2024 · 2 comments · Fixed by #4029
Closed

Scylla Manager regression in 3.3.2: segmentation fault #4028

rzetelskik opened this issue Sep 9, 2024 · 2 comments · Fixed by #4029

Comments

@rzetelskik
Copy link
Member

rzetelskik commented Sep 9, 2024

Scylla Manager throws segmentation faults in both Scylla Operator CI suites involving Scylla Manager.

https://prow.scylla-operator.scylladb.com/view/gs/scylla-operator-prow/pr-logs/pull/scylladb_scylla-operator/2089/pull-scylla-operator-master-e2e-gke-parallel/1833059833171939328
https://gcsweb.scylla-operator.scylladb.com/gcs/scylla-operator-prow/pr-logs/pull/scylladb_scylla-operator/2089/pull-scylla-operator-master-e2e-gke-parallel/1833059833171939328/artifacts/must-gather/0/namespaces/scylla-manager/pods/scylla-manager-7fb4d59cfc-lpmtz/scylla-manager.previous

2024-09-09T09:09:22.233069447Z panic: runtime error: invalid memory address or nil pointer dereference
2024-09-09T09:09:22.233091297Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x11da5ac]

Artifacts: https://gcsweb.scylla-operator.scylladb.com/gcs/scylla-operator-prow/pr-logs/pull/scylladb_scylla-operator/2089/pull-scylla-operator-master-e2e-gke-parallel/1833059833171939328/


https://prow.scylla-operator.scylladb.com/view/gs/scylla-operator-prow/pr-logs/pull/scylladb_scylla-operator/2089/pull-scylla-operator-master-e2e-gke-parallel-clusterip/1833059833192910848
https://gcsweb.scylla-operator.scylladb.com/gcs/scylla-operator-prow/pr-logs/pull/scylladb_scylla-operator/2089/pull-scylla-operator-master-e2e-gke-parallel-clusterip/1833059833192910848/artifacts/must-gather/0/namespaces/scylla-manager/pods/scylla-manager-7fb4d59cfc-f5vw8/scylla-manager.previous

2024-09-09T09:03:55.901858409Z panic: runtime error: invalid memory address or nil pointer dereference
2024-09-09T09:03:55.901873239Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x11da5ac]

Artifacts: https://gcsweb.scylla-operator.scylladb.com/gcs/scylla-operator-prow/pr-logs/pull/scylladb_scylla-operator/2089/pull-scylla-operator-master-e2e-gke-parallel-clusterip/1833059833192910848/

ScyllaDB Manager version: 3.3.2
ScyllaDB version: tests run with OS 6.1.1 and Enterprise 2024.1.8
ScyllaDB Manager client version: tests run with 3.3.1

With this we can't confirm the supposed fix for #3989 is working either.

Xref: scylladb/scylla-operator#2089 (comment)

@Michal-Leszczynski
Copy link
Collaborator

So it looks like it's possible to have cluster without cluster.Host but with cluster.KnownHosts specified:

2024-09-09T13:31:37.901655596Z {"L":"ERROR","T":"2024-09-09T13:31:37.901Z","N":"cluster","M":"Cluster contact points","cluster ID":"09ed5d68-abff-4cc6-9e86-88617d5d60e0","host":"","known hosts":["10.66.193.122","10.66.193.67","10.66.193.77","10.66.193.93"],"_trace_id":"Os76x_XDSziKH63oR1HoUg","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/go-log@v0.0.7/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/go-log@v0.0.7/logger.go:84\ngithub.com/scylladb/scylla-manager/v3/pkg/service/cluster.(*Service).discoverClusterHosts\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/cluster/service.go:180\ngithub.com/scylladb/scylla-manager/v3/pkg/service/cluster.(*Service).discoverAndSetClusterHosts\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/cluster/service.go:160\ngithub.com/scylladb/scylla-manager/v3/pkg/service/cluster.(*Service).CreateClientNoCache\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/cluster/service.go:141\ngithub.com/scylladb/scylla-manager/v3/pkg/service/configcache.(*Service).updateSingle\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/configcache/service.go:157\ngithub.com/scylladb/scylla-manager/v3/pkg/service/configcache.(*Service).updateAll.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/configcache/service.go:208"}

I'm not sure how it happened. Perhaps the cluster was added with --host, and later on updated without --host flag?

Michal-Leszczynski added a commit that referenced this issue Sep 10, 2024
Commit 5bf6b35 introduced a bug that when adding cluster to SM finished with error,
cluster was still saved in SM DB, but only with the ID and known hosts fields.
That's because method validateHostsConnectivity used discoverAndSetClusterHosts
instead of discoverClusterHosts, which happened before inserting cluster to SM DB.

Fixes #4028
@Michal-Leszczynski
Copy link
Collaborator

The root cause of this issue is that if adding cluster to SM finished with error, cluster was still saved in SM DB, but only with the ID and known hosts fields. That's because method validateHostsConnectivity used discoverAndSetClusterHosts instead of discoverClusterHosts, which happened before inserting cluster to SM DB.

Older (created before SM 3.2.6) clusters not updated with --host flag could also end up in the same state.

karol-kokoszka pushed a commit that referenced this issue Sep 11, 2024
Commit 5bf6b35 introduced a bug that when adding cluster to SM finished with error,
cluster was still saved in SM DB, but only with the ID and known hosts fields.
That's because method validateHostsConnectivity used discoverAndSetClusterHosts
instead of discoverClusterHosts, which happened before inserting cluster to SM DB.

Fixes #4028
karol-kokoszka pushed a commit that referenced this issue Sep 12, 2024
Commit 5bf6b35 introduced a bug that when adding cluster to SM finished with error,
cluster was still saved in SM DB, but only with the ID and known hosts fields.
That's because method validateHostsConnectivity used discoverAndSetClusterHosts
instead of discoverClusterHosts, which happened before inserting cluster to SM DB.

Fixes #4028
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants