Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PWX-38297 #1656

Open
wants to merge 23 commits into
base: hk-feature-clusterDiags
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
a8fb389
PWX-25414: Parallel/smart upgrades with minimum app downtime (#1629)
nikita-bhatia Aug 12, 2024
0b87a72
PWX-38512 Skip token refresh verification if host-pid not enabled (#1…
ssz1997 Aug 12, 2024
9bc256c
PWX-38583 : fix nil pointer derefernce in operator pod (#1643)
nikita-bhatia Aug 14, 2024
f37e98f
PWX-38545 : Failed to create AlertManager object with prometheus-ope…
nikita-bhatia Aug 15, 2024
35b6e99
fix go.mod for 24.2.0 (#1644)
nikita-bhatia Aug 15, 2024
30ed9ab
PWX-38596 : After disabling PX Security STC goes into DEGRADED state …
nikita-bhatia Aug 19, 2024
2322be0
PWX-38372 Cherry-picking diags code from master (#1638)
kachaudhary-px Aug 13, 2024
428e59d
PWX-38098 Implement the entrypoint for collecting pod logs (#1650)
kachaudhary-px Aug 20, 2024
bc9fd72
modified pod template to include pod logs
kachaudhary-px Aug 20, 2024
d936549
mcreating pod template and pod to collect portworx pod logs
kachaudhary-px Aug 21, 2024
5412f11
fixup!
kachaudhary-px Aug 27, 2024
a5fddb3
adding the status for pod logs collection
kachaudhary-px Aug 29, 2024
2893826
fixup for adding pod logs status
kachaudhary-px Aug 29, 2024
2c72c70
adding the auto-updated zz_generated.deepcopy.go file
kachaudhary-px Aug 29, 2024
055adce
fixup! fixup!
kachaudhary-px Aug 30, 2024
304528a
rebasing on my feature branch
kachaudhary-px Sep 2, 2024
d1940ff
fixup for updating the status of pod logs colelction
kachaudhary-px Sep 4, 2024
645c051
updating the overall phase and message
kachaudhary-px Sep 4, 2024
75bfe0b
Modified the pod creation conditions for pod logs
kachaudhary-px Sep 4, 2024
dda75b5
different labels for node diags and pod logs
kachaudhary-px Sep 5, 2024
3e63bc8
cleaning up the code
kachaudhary-px Sep 5, 2024
f4bc340
fixup to maintain the consistency in the overall status
kachaudhary-px Sep 7, 2024
f7842a1
rebasing on 24.2.0
kachaudhary-px Sep 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions cmd/operator/operator.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ import (
"github.com/libopenstorage/operator/drivers/storage"
_ "github.com/libopenstorage/operator/drivers/storage/portworx"
"github.com/libopenstorage/operator/pkg/apis"
"github.com/libopenstorage/operator/pkg/controller/portworxdiag"
"github.com/libopenstorage/operator/pkg/controller/storagecluster"
"github.com/libopenstorage/operator/pkg/controller/storagenode"
_ "github.com/libopenstorage/operator/pkg/log"
Expand All @@ -52,6 +53,7 @@ const (
flagEnableProfiling = "pprof"
flagDisableCacheFor = "disable-cache-for"
defaultLockObjectName = "openstorage-operator"
flagEnableDiagController = "diag-controller"
defaultResyncPeriod = 30 * time.Second
defaultMetricsPort = 8999
defaultPprofPort = 6060
Expand Down Expand Up @@ -110,6 +112,10 @@ func main() {
Name: flagEnableProfiling,
Usage: "Enable Portworx Operator profiling using pprof (default: false)",
},
cli.BoolFlag{
Name: flagEnableDiagController,
Usage: "Enable Portworx Diag Controller (default: false)",
},
cli.StringFlag{
Name: flagDisableCacheFor,
Usage: "Comma separated object types to disable from cache to reduce memory usage, for example \"Pod,ConfigMap,Deployment,PersistentVolume\"",
Expand Down Expand Up @@ -146,6 +152,8 @@ func run(c *cli.Context) {
}()
}

diagControllerEnabled := c.Bool(flagEnableDiagController)

config, err := rest.InClusterConfig()
if err != nil {
log.Fatalf("Error getting cluster config: %v", err)
Expand Down Expand Up @@ -183,6 +191,15 @@ func run(c *cli.Context) {
log.Fatalf("Error registering CRD's for StorageNode controller: %v", err)
}

var diagController portworxdiag.Controller
if diagControllerEnabled {
diagController = portworxdiag.Controller{Driver: d}
err = diagController.RegisterCRD()
if err != nil {
log.Fatalf("Error registering CRDs for PortworxDiag controller: %v", err)
}
}

// TODO: Don't move createManager above register CRD section. This part will be refactored because of a bug,
// similar to https://github.com/kubernetes-sigs/controller-runtime/issues/321
mgr, err := createManager(c, config)
Expand Down Expand Up @@ -256,6 +273,12 @@ func run(c *cli.Context) {
log.Fatalf("Error initializing storage node controller: %v", err)
}

if diagControllerEnabled {
if err := diagController.Init(mgr); err != nil {
log.Fatalf("Error initializing portworx diag controller: %v", err)
}
}

if err := storageClusterController.StartWatch(); err != nil {
log.Fatalf("Error start watch on storage cluster controller: %v", err)
}
Expand All @@ -264,6 +287,12 @@ func run(c *cli.Context) {
log.Fatalf("Error starting watch on storage node controller: %v", err)
}

if diagControllerEnabled {
if err := diagController.StartWatch(); err != nil {
log.Fatalf("Error starting watch on portworx diag controller: %v", err)
}
}

if c.BoolT(flagMigration) {
log.Info("Migration is enabled")
migrationHandler := migration.New(&storageClusterController)
Expand Down
10 changes: 10 additions & 0 deletions deploy/crds/core_v1_storagecluster_crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,16 @@ spec:
their place. Once the new pods are available, it then proceeds onto other
StorageCluster pods, thus ensuring that at least 70% of original number of
StorageCluster pods are available at all times during the update.
disruption:
type: object
description: >-
The default behavior is non-disruptive upgrades. This setting disables the default
non-disruptive upgrades and reverts to the previous behavior of upgrading nodes in
parallel without worrying about disruption.
properties:
allow:
type: boolean
description: Flag indicates whether updates are non-disruptive or disruptive.
deleteStrategy:
type: object
description: Delete strategy to uninstall and wipe the storage cluster.
Expand Down
144 changes: 144 additions & 0 deletions deploy/crds/portworx.io_portworxdiags.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.11.3
creationTimestamp: null
name: portworxdiags.portworx.io
spec:
group: portworx.io
names:
kind: PortworxDiag
listKind: PortworxDiagList
plural: portworxdiags
shortNames:
- pxdiag
singular: portworxdiag
scope: Namespaced
versions:
- additionalPrinterColumns:
- description: Status of the Portworx diag collection.
jsonPath: .status.phase
name: Status
type: string
- description: Age of the diag resource.
jsonPath: .metadata.creationTimestamp
name: Age
type: date
name: v1
schema:
openAPIV3Schema:
description: PortworxDiag represents a portworx diag
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: PortworxDiagSpec is the spec used to define a portworx diag.
properties:
portworx:
description: Configuration for diags collection of the main Portworx
component.
properties:
generateCore:
description: Generates the core dump as well when collecting the
diags. Could be useful to analyze the current state of the system.
type: boolean
nodes:
description: Nodes for which the diags need to be collected. If
a volume selector is also specified, then both the selectors
will be honored and the selected nodes will be a union of both
selectors.
properties:
all:
description: Select all nodes in the Portworx cluster. If
set to true, other selectors are ignored.
type: boolean
ids:
description: Ids of the nodes to be selected.
items:
type: string
type: array
labels:
additionalProperties:
type: string
description: Labels of the volumes to be selected.
type: object
type: object
volumes:
description: Volumes for which the diags need to be collected.
properties:
ids:
description: Ids of the volumes to be selected.
items:
type: string
type: array
labels:
additionalProperties:
type: string
description: Labels of the volumes to be selected.
type: object
type: object
collectPodLogs:
description: Flag indicating whether we want to collect pod logs too or not.
type: boolean
type: object
type: object
status:
description: PortworxDiagStatus is the status of a portworx diag.
properties:
clusterUuid:
description: UUID of the Portworx cluster. This is useful to find
the uploaded diags.
type: string
message:
description: Optional message used to give the reason for any failure.
type: string
nodes:
description: Status of the diags collection from all the selected
nodes.
items:
description: Status of the diags collection from a single node.
properties:
message:
description: Optional message used to give the reason for any
failure.
type: string
nodeId:
description: ID of the node for which the diag status is reported.
type: string
status:
description: One word status of the diags collection on the
node.
type: string
type: object
type: array
collectPodLogs:
description: Status of the diags collection from all the pods with label 'portworx'.
properties:
message:
description: Optional message used to give the reason for any failure.
type: string
status:
description: One word status of the diags collection on the pod.
type: string
type: object
phase:
description: One word status of the entire diags collection job.
type: string
type: object
type: object
served: true
storage: true
subresources:
status: {}
Loading
Loading