-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When pgBackRest backup takes longer the next scheduled run, it will create a storm of Failed Pods #3439
Comments
Agreed. Even if pgBackRest is able to take concurrent backups, we don't need multiple running for the same repo+type.
Yes, I believe so. If you want to make this change, please use the constants defined in the batch/v1 package. |
Only one pgBackRest backup can run at a time. A scheduled backup that runs too long can cause the next scheduled backup to fail and retry multiple times. Skip that next one instead. Co-authored-by: Scott Zelenka <szelenka@cisco.com> Issue: #3439
* Replace HandleDeleteNamespace Test With KUTTL (#3172) TestReconcilerHandleDeleteNamespace was prone to flakes when run with `envtest-existing`, and so is here replaced by a KUTTL test with matching functionality. Issue [sc-14273] * Update root CA certificate ownership kuttl test Adds better check logic to account for potential race conditions that may be encountered in some environments due to delays in garbage collection and ownership updating. Also fixed a comment and harmonized filenames with existing patterns. * Remove envtest-existing from upgradecheck (#3158) * Remove envtest-existing from upgradecheck `envtest-existing` tests have been flaky and we are moving towards KUTTL tests for e2e PostgresCluster behavior; several tests in the `upgradecheck` package were originally written as `envtest-existing` but are not really suitable as KUTTL tests, so this PR changes them from `envtest-existing` to `envtest` Issue [sc-14243] * Remove CrunchyData packages from PGO controller image This update allows the PGO controller image to be built without CrunchyData specific RPMs. All existing make targets continue to function in the same way as before, but the PGO controller image no longer utilizes the base image. The base image is still used by the Crunchy Postgres Exporter image. Issue: [sc-14268] * Update OLM bundle generation This commit makes the following changes to the OLM bundle generation logic: - Update the version replacement value for OLM to 5.0.5 - Update the minimum supported Kubernetes version to 1.19 - Update logo files - Update related images to exclude PG 12 and PG Upgrade (only in marketplace, removed to provide consistent images) - Fix operator annotations for certified and marketplace - Update README with information regarding issues encountered with 5.1.0 bundles - Update post bundle generation README instructions - Update generation logic to match expected file, project and package names. - Add a comment that minKubeVersion must support the related OCP version range. Issue: [sc-13935] * Remove code that generates the GCP installer Issue: [sc-12828] * Enable seccomp on containers (#3193) As of Kubernetes v1.19, SecurityContext has a seccompProfile field that can be set to RuntimeDefault to limit syscalls. This PR adds that setting to the containers in order to (a) limit syscalls from PGO-managed containers, while (b) not preventing users from using other tools involving sidecars, etc. Issue [sc-11286] * Deflake TestReconcileReplicaCreateBackup (#3198) TestReconcileReplicaCreateBackup was flaking in envtest-existing runs; experimentation revealed this was due to garbage collection. Following current practice, this PR skips the test in envtest-existing runs. Issue [sc-14382] * Add Script for Updating the Monitoring Installer Adds a script for updating the "monitoring" Kustomize installer in the PGO examples repo using specific pgMonitor tag provided. Issue: [sc-13611] * Mention support for certified Kubernetes distros Issue: [sc-14373] * Add missing image parameter in documentation Issue: [sc-14406] * Link to collection notice Issue: [sc-13940] * Update pgAdmin4 docs login information pgAdmin requires that the login username be formatted as an email. When syncing PGO users with the pgAdmin database we add the `@pgo` suffix to match this formatting. This change updates the documentation to match this change. * Kuttl test to create a cluster and resize the PVC This test creates two simple clusters with a single primary and a repo host. In the first cluster we create data then increase the size of the pvc. Then we check that the pvc size has changed, the size matches the new expected side and the data is still present. In the second cluster we attempt decrease the size of the volume and expect the PersistentVolumeError. * Pre-release update for v5.1.1 (#3200) * Pre-release update for v5.1.1 [sc-14408] * Fix typo in Extension Management. * Update update-cluster.md * Update update docs (#3202) Revise update docs (a) add note about potential automatic rollout of clusters when upgrading (b) spin off separate upgrade section, with v4-v5 subsection (c) tweak a little Issue [sc-14467] * updated from pg13 to pg14 in the update cluster instructions (#3209) * updated from pg13 to pg14 in the update cluster instructions * returned values to prior version to ensure images are present to run k3d(s) tests * Add docs on removing PVC labels When migrating from v4 to v5, some legacy labels may remain and cause unintended behavior. This PR adds documentation around that issue and the manual fix (done manually to avoid PGO having to remove labels). Issue [sc-14477] * PR feedback * Revert "Enable seccomp on containers (#3193)" (#3215) * Revert "Enable seccomp on containers (#3193)" This reverts commit 6193560. * update Release notes * Align Related Images in manager.yaml With OLM The releated images in the manager.yaml file now align with the related images configured for OLM using related-images.yaml. Issue: [sc-14517] * Wait for Patroni labels in tests that switchover * Check for Endpoints in deletion tests We do not set ownership on Patroni DCS Endpoints. These test should verify that our controller is deleting them. See: c13154e * Updates for PG 10 looping support PG 10 does not have stored procedures that support embedded transaction. To get around this we use a bash and kubectl loop * Update Github question template Update general issue template to include necessary detail information for incoming questions. Issue [sc-14613] * Bump gopkg.in/yaml.v3 to v3.0.0 This addresses CVE-2022-28948. * Simplify the PKI implementation The original implementation dynamically assigns functions that return errors so we can swap them under test. Errors from these calls are wrapped in sentinels so they can be identified at runtime. In practice, however, these errors are never examined. - Sentinel errors are removed. The "encoding/pem.Decode" function does not return errors, so we still generate our own in two places. - All "Parse" functions are removed and replaced by their "Unmarshal" equivalents. - Most "New" functions are removed. One remains to generate a fresh root CA certificate and private key pair. - IP addresses are removed. Fields on the "Certificate" and "PrivateKey" types are not exported, making them opaque to consumers except for the PEM marshaling methods. This provides a few benefits: - The algorithms for keys and signatures can change without affecting callers. - Certificates are parsed as they are generated and unmarshaled. Their values are always either zero or fully parsed. - The root CA is parsed once per reconcile loop rather than once per leaf. - Getter methods return copies so that certificate fields cannot change. Issue: [sc-14620] * Document that PKI objects marshal for OpenSSL PostgreSQL, Patroni, pgBackRest, and PgBouncer all use certificates through OpenSSL bindings. The format emitted by "MarshalText" is already compatible with OpenSSL, so document that and add tests to enforce it. * Consolidate PKI choices in a single file It is easier to evaluate curves, curve parameters, signature algorithms, key lengths, certificate constraints, and validity periods when they are all in one place. * Return API errors when checking certificates The changes to certificate parsing in a prior commit make it clear that we are swallowing errors from the Kubernetes API in most places where we check if a certificate needs to be regenerated. Issue: [sc-14620] * Replace certificates when their subject changes We want to recreate certificates when their contents do not meet our requirements. This includes the subject common name (CN) and subject alternative names (SANs). Issue: [sc-14620] * Parse certificates and keys when their Secret exists Also explain why parse errors can be ignored. Issue: [sc-14620] * Rotate leaf cert before expiration (#3229) * Rotate leaf cert before expiration -- go with 2/3rd lifespan as per cert-manager * update docs * fix shellcheck Issue [sc-11173] Co-authored-by: tjmoore4 <42497036+tjmoore4@users.noreply.github.com> Co-authored-by: Chris Bandy <bandy.chris@gmail.com> * Create an EventRecorder for tests The FakeRecorder provided by "k8s.io/client-go" coverts each event to a string and sends it to a channel. Tests that want to check the Type or Reason separately from the Message have to resort to regexp captures. Instead, this implementation does the same work as EventRecorder without trying to batch or correlate isomorphic events. Calls to the recorder are stored in a slice of events/v1.Event that tests can interrogate. * Change the volume claim test into a unit test This is the last API test that requires a full Kubernetes cluster, and it flakes during PR checks. We added an end-to-end KUTTL test for resizing volumes in 112c910, so the remaining value of this test is in contrived scenarios that trigger a handful of error paths. Reduce the test to those paths with errors mimicking those from the API. Describe the scenarios that lead to those errors and link to their origins in Kubernetes. Issue: [sc-14270] See: 112c910 Co-authored-by: jmckulk <joseph.mckulka@crunchydata.com> * Pause/Resume PostgresCluster Reconciliation Adds the ability to pause the Postgres cluster reconciliation process by setting the `spec.paused` attribute to `true`. Pausing a cluster suspends any changes to the cluster’s current state until reconciliation is resumed. Reconciliation is resumed by either setting `spec.paused` to `false` or removing the setting from your manifest. Issue: [sc-11606] * Remove 'LastTransitionTime' from 'handlePersistentVolumeClaimError' The 'SetStatusCondition' function already sets 'LastTransitionTime', so remove that setting from the 'handlePersistentVolumeClaimError' method. Reference: - https://github.com/kubernetes/apimachinery/blob/v0.20.8/pkg/api/meta/conditions.go#L30 * Skip TestDeleteInstance when connected to an existing cluster Other controllers touch PersistentVolumeClaims and StatefulSets after we create them, causing conflicts when we delete them with preconditions. Outside of tests, the entire reconciliation is retried, so skip this test for now. * Add support for feature gates Adds a feature gate capability to PGO by leveraging the relevant Kubernetes packages. This will allow users to enable or disable certain features by setting the "PGO_FEATURE_GATES" environment variable to a list similar to "feature1=true,feature2=false,..." in the PGO Deployment. Issue [sc-14488] * Use timeline as status to prevent multiple failovers (#3235) * Get timeline from Patroni before failing/switching over * Update delete KUTTL test * Get timeline from patroni * PR feedback Issue [sc-14610] * All Custom Sidecars for PostgreSQL Instance Pods This commit allows you to configure custom sidecar Containers for any of your PostgreSQL instance Pods. To use this feature, currently in `Alpha`, you will need to enable it via the relevant PGO feature gate. This is done by setting the `PGO_FEATURE_GATES` environment variable on the PGO Deployment to 'PGO_FEATURE_GATES="InstanceSidecars=true' Issue: [sc-12621] * Update the 'Create TODO patch Script' for instance sidecar containers This commit updates the script used to patch the PostgresCluster CRD to remove any 'TODO' references from the upstream Container spec. It also updates the generated patch file and modifies the script's 'yq' command to more clearly use Python YQ. * Update conditions.yaml for sidecar containers PR Add an entry to conditions.yaml to remove a newline character from the seccompProfile type description so the 'trailing space' documentation linter will pass. * Allow Custom Sidecars for pgBouncer Pods This commit allows you to configure custom sidecar Containers for your pgBouncer Pods. To use this feature, currently in `Alpha`, you will need to enable it via the relevant PGO feature gate. This is done by setting the `PGO_FEATURE_GATES` environment variable on the PGO Deployment to 'PGO_FEATURE_GATES="PGBouncerSidecars=true' Also adds an entry to conditions.yaml to remove a newline character from the seccompProfile type description so the 'trailing space' documentation linter will pass and updates todos.yaml to remove any 'TODO' references from the upstream Container spec. Issue: [sc-14727] * Update Custom Sidecar Containers for PostgreSQL Instance Pods Comment Updates the custom sidecar container comment on the PostgreSQL instance set spec to mention the restart behavior and conform to the pgBouncer custom sidecar container comment format. * Add wait for delete test (#3264) * Add wait for delete test * Lower timeout, quote pod name * Use ReadWriteOnce through documentation It is the most commonly supported access mode. Issue: [sc-14874] * Add custom scheduling for backup jobs (#3260) * add Affinity, Tolerations to backup jobs * add unit testing * clean up references to restarting if certain fields change Issue [sc-11582] * Change use_pg_rewind for PG10 (#3258) * don't use use_pg_rewind for pg10 * update KUTTL test to reinit pg10 for PITR Issue [sc-12408] * Drop default container runtime capabilities The restricted profile of Kubernetes' Pod Security Standards requires dropping all POSIX capabilities. Issue: [sc-10828] See: https://docs.k8s.io/concepts/security/pod-security-standards/ * Allow Streaming Replication Clusters can now be configured to automatically enable streaming replication from a remote primary. - The `spec.standby` section of the postgrescluster spec allows users to define a `host` and `port` that point to a remote primary - The `repoName` field is now optional - Certificate auth is required when connecting to the primary. Users must configure custom tls certs on the standby that allow this authentication method - Replication user will be the default `_crunchyrepl` user - A cluster will not be created if the standby spec is invalid - kuttl: deploy two clusters, a primary and standby, in a single namespace. Ensure that the standby cluster has replicated the primary data and the walreciever process is running * update release from 5.1.1 to 5.1.2 added release documentation (#3280) [sc-14902] * Fix 'GCS' Typo in Azure Storage Blob section The Azure Storage Blob section contains the following sentence: "Similar to the above, setting up backups in Azure Blob Storage requires a few additional modifications to your custom resource spec and the use of a Secret to protect your GCS credentials." it should read: ``` Similar to the above, setting up backups in Azure Blob Storage requires a few additional modifications to your custom resource spec and the use of a Secret to protect your Azure Storage credentials. ``` * Remove unnecessary type conversions * Update GitHub actions * Use GitHub step summaries to report coverage * minor typo * Update Release Notes * Quarantine flaky delete test (#3290) * Quarantine flaky delete test The `delete` test that looked at event timestamps to make sure the replica deleted before the primary occasionally flaked out. This PR removes that timestamp checking, quarantining that version of the test for future debugging; and changes the in-use test to simply verify that a cluster with replica deletes. This PR also fixes an error in the delete tests where the -delete.yaml was incorrectly set up. Issue [sc-15009] * OLM validation update Update the 'validate_bundle_image' function in validate-bundles.sh to remove the command that generates the updated registry database. This command is no longer required when validating the OLM bundles. Also updates the README to address this change and add a troubleshooting section. Issue: [sc-15044] * Allow NodePort Port to be Specified via the PostgresCluster Spec This update allows a specific NodePort port to be specified for the primary Postgres, pgBouncer and pgAdmin services via the PostgresCluster spec. Note this is used when type is NodePort or LoadBalancer only. Setting this value when using the 'ClusterIP' type will result in an error. The specified value must be also be in-range and not currently in use or the operation will fail. If unspecified, a port will be allocated if this Service requires one as before. Resolves #3008 Issue: [sc-14918] * Generate a non-expiring token in development The LegacyServiceAccountTokenNoAutoGeneration feature gate is enabled by default in Kubernetes v1.24. Issue: [sc-11491] * Labels and Annotations for Individual Services This update adds support for labeling and annotating the Postgres, pgAdmin and pgBouncer services individually. This allows these services reconciled by PGO to have certain labels and/or annotations configured that are not set on any other PGO objects. Issue: [sc-14916] resolves: #3265 * added documentation for root certificate rotation (#3298) * added documentation for root certificate rotation [sc-14561] * Update docs/content/tutorial/administrative-tasks.md Co-authored-by: Chris Bandy <bandy.chris@gmail.com> * updated per pr comments Co-authored-by: Chris Bandy <bandy.chris@gmail.com> * Set the 'pg_ctl' timeout This commit sets the 'pg_ctl' timeout to a very large value (1 year in seconds) to ensure there are no timeouts when starting or stopping Postgres. Issue [sc-15140] * Add fsGroupChangePolicy to pod (#3296) * Add fsGroupChangePolicy to pod Issue [sc-14235] * Bump test behavior around fsGroupChangePolicy * bump test k8s 1.19=>1.20 for github k3d test action * specify 1.19 for kubernetes-api test action, alter tests to check for k8s version and pass with 1.19 (no fsGroupChangePolicy in check) and >=1.20 (fsGroupChangePolicy in check) * update to drop all capabilities security context (#3305) [sc-14936] * Align psql Job Backoff Limit & Restart Policy Sets the "backoffLimit" to "6", and the "restartPolicy" to "never", for all psql Jobs in the Kuttl test suite. This has been done to to address psql Jobs that are sometimes reaching the current backoff limit and failing, while also better aligning all psql Jobs within the Kuttl test suite. Additionally, a "restartPolicy" of "never" should also help facilitate the debugging of failed psql Jobs. * Remove the postmaster.pid file prior to pgBackRest restore This commit removes the postmaster.pid file, if it exists, from the PGDATA directory before attempting a restore. This allows the restore to be tried more than once without causing an error due to the presence of the file in subsequent attempts or in scenarios where the file is otherwise present. Issue: [sc-15157] * Update the pgBackRest restore command for better logging This commit updates the pgBackRest restore script so that the restore command arguments are displayed in the restore Job logs. * Update Standby Replication Diagrams This commit updates the existing repo-based standby cluster configuration diagram to a new Draw.io generated image and adds the associated XML file. It also creates two new diagrams to illustrate a streaming standby cluster configuration and a cluster that configured to have both a streaming standby and an external repo. Issue: [sc-14710] * Add name and version Labels to CRD during generation Adds the name and version labels, i.e. app.kubernetes.io/name: pgo app.kubernetes.io/version: 5.1.2 to the PostgresCluster CRD generation process and update the current CRD to match. This will align all of our CRDs across install method. * Branch in tests based on the server version rather than environment When there is no environment variable defined, the envtest tools use a default version of the Kubernetes API. Interrogating the API works regardless of any tooling. See: 7ed8677 * Use Bash to assert on dropped caps in E2E tests OpenShift appends to the list of dropped capabilities, and KUTTL is unable to assert a subset of that list. Do the assertion ourselves in a script rather than create a copy of the test specifically for OpenShift. Issue: [sc-15297] See: kudobuilder/kuttl#76 * Set runAsNonRoot at the container-level only Some service meshes require privileged init-containers or sidecars, and the pod-level setting prevents these from working correctly. We satisfy Kubernetes' Restricted Pod Security policy by setting "runAsNonRoot" for all our containers, so setting it on the pod is redundant. Issue: [sc-15204] See: https://kubernetes.io/docs/concepts/security/pod-security-admission/ See: https://kubernetes.io/docs/concepts/security/pod-security-standards/ * Verify security contexts using the Kyverno CLI when available * Go package updates This commit updates the go-yaml, client_golang and golang crypto packages. Issue: [sc-15314] * Bump 5.1.2 to 5.2.0 * Update components and extensions * Wrap PITR sections of the docs There was a typo around column 300 that went unnoticed. Adjust some wording along the way. Issue: [sc-14869] * Update PostgreSQL cluster architecture diagram Replaces the existing PostgreSQL cluster architecture diagram, adds the relevant draw.io xml file, deletes the old image file and adjusts the documentation around the new image. Co-authored-by: @cbrianpace Issue: [sc-15266] * go fmt with Go 1.19 to address lint errors * update certificate rotation by combining 2 sections of the documents * Update docs/content/tutorial/administrative-tasks.md Co-authored-by: Chris Bandy <bandy.chris@gmail.com> * Update SHA value placeholders for OLM bundle generation This commit updates the SHA placeholders used during OLM bundle generation to a more unique value. This will better facilitate replacement when adding the SHAs. * Link to the CLI documentation in release notes Issue: [sc-15449] * Fix typos in the latest release notes * Update upgrade docs header * Fix related image registry value Remove the duplicate 'crunchydata' from the registry value of the pgAdmin image information. * Fix off-by-one in related images Issue: [sc-15680] * Fix typo in docs/content/references/components.md * Update links in readmes (#3378) * update links * update linter (disable contextcheck, add contextcheck to .next) * update test: pin to 1.24 rather than latest Issue [sc-15609] * Update v1.SecurityContexts to current Pod Security Standards The restricted policy changed in Kubernetes 1.23 with the addition of Pod Security Admission. The seccomp profile will need to be revisited due to OpenShift. Issue: [sc-14232] See: https://docs.k8s.io/concepts/security/pod-security-admission/ See: https://docs.k8s.io/concepts/security/pod-security-standards/ * Update runtime-controller (#3362) * Remove unused SSA workarounds for Kubernetes 1.18 We have not supported Kubernetes 1.18 for some time now. OpenShift 4.6 is based on Kubernetes 1.19. * Update runtime-controller * update runtime-controller * adjust logger * adjust envtest * adjust tests Issue [sc-12818] * update crd * remove potentially unnecessary cleanup Co-authored-by: Chris Bandy <chris.bandy@crunchydata.com> * Turn off JIT for only monitoring user's context It prevents issues related to monitoring queries: - slow query executing due to unnecessary inlining, optimization and emission - memory leak due to re-creating struct types during inlining related issues (CrunchyData/crunchy-containers#1381) (CrunchyData/pgmonitor#182) On the other hand database is open to enabling JIT for other users Issue: [sc-15755] Signed-off-by: Kirill Petrov <chobostar85@gmail.com> * Update crd-docs (#3391) * CRD & doc update Issue: [sc-12818] * Change linter GH action This splits the GH linter action that was checking for TODOs and trailing spaces in the documentation into two actions: * one that checks TODOs and trailing spaces in all files except the crd * one that checks TODOs only in the crd.md file * Update monitor versions in deps scripts (#3394) Updates pgMonitor and postgres-exporter version in dep scripts. Issue: [sc-15707] * Fix compatibility with Kubernetes 1.25 (#3370) * batchv1beta1 => batchv1, policyv1beta1 => policyv1 This changes in particular: * policyv1beta1.PodDisruptionBudget => policyv1.PodDisruptionBudget * batchv1beta1.CronJob => batchv1.CronJob * Run tests with kubernetes 1.21. * Update .github/workflows/test.yaml Co-authored-by: Benjamin Blattberg <benjamin.blattberg@gmail.com> * Update links to pgAdmin code and documentation The Git repository for this project moved around the same time as its issue tracker. See: https://postgr.es/m/CA+OCxozG9KV_NCaU9juHCLWti+0hD+tWX053iL3A_S0Z=z9GQg@mail.gmail.com * Remove pki NoNames test OpenSSL 3.x returns an error when the subject name is empty on a cert. The cert is no longer valid so we don't need the test. * Adjust GH kubernetes-api test (#3405) * test against default Issue: [sc-15835] * PGO updates pgnodemx/pg_stat_statements (#3400) * PGO updates pgnodemx/pg_stat_statements Users reported that an updated image wouldn't trigger an update of monitoring extensions. This changes that behavior by * adding the monitor and pg image tags to the revision hash, * adding update lines to the pgmonitor enable action. Note: this _only_ targets these two extensions as updating other extensions should probably be under the user's power. Issue: [sc-14476] * add KUTTL test for exporter upgrade errors * Remove CentOS References from Docs * Update links to JDBC documentation The link we used for connection parameters and URIs was broken, 404. * update to go 1.19 from go 1.17 Issue: [sc-15423] * Update CRD and todo hack script for v0.23.0 * Add newlines to pgmonitor docs * Custom TLS for Exporter (Encryption Only) With this change we allow users to bring custom certificates and enable TLS for the exporter. This will be an opt-in feature, PGO will not automatically generate certs like it does for some other features. You can enable TLS by using the following spec fields: spec: monitoring: pgmonitor: exporter: customTLSSecret: name: hippo.tls Once TLS is enabled in the exporter, you can configure your Prometheus instance to scrape over https. * Operator logging for database init SQL failures (#3033) If there is an error in the init SQL that runs as part of reconcileDatabaseInitSQL, then there is no way for the user to know what the error is. Adding this additional log statement will make it easier for users to know when init sql operations have succeeded and/or failed. It also brings this part of the code up to par with other similar operations in the codebase. Issue: #3029 Co-authored-by: Jeff Martin <jeff.martin@previ.com> * Adding source code changes for workaround for IPv6 issue in pgBackRest (#1841). * Adding updated documentation for pgBackRest IPv6 workaround. * Update internal/pgbackrest/config_test.go Co-authored-by: Chris Bandy <bandy.chris@gmail.com> * Update internal/pgbackrest/config.go Co-authored-by: Chris Bandy <bandy.chris@gmail.com> * Changed code to use strings.EqualFold() for case-insensitive comparison. * Update pgBackRest repo option logic When taking a backup, PGO tries to help by not allowing the user to pass the "--repo" option. However, the current method for catching this results in catching any option that begins with "--repo", which prevents users from passing in perfectly valid options. This commit corrects the flag check to only block on exact matches of "--repo". Issue: [sc-16128] * Bumping kubebuilder:validation:Maximum for major PostgresVersion to 15. * Add constants for services registered with the IANA The PostgreSQL and pgBackRest protocols are both registered with the IANA according to RFC 6335. See: https://www.iana.org/assignments/service-names-port-numbers * Get primary name after waiting for redeploy * Update kuttl tests for Postgres 15 public schema updates With Postgres 15, the removal of PUBLIC creation permisson on the public schema requires updates to our kuttl test logic. This commit allows the tests to perform as expected with these new changes by creating/referencing new schemas as needed. Note that these changes should not impact Postgres versions < 15. Issue: [sc-16289] * Alter make generate-kuttl to quiet output (#3442) * Pass the upgrade-check URL as an argument The global value is now a constant and somewhat easier to reason about. * Handle upgrade-check panics in a single place * Start and stop upgrade-check using controller-runtime Blocking functions can be added to a controller-runtime Manager so that they start after caches have started and synced. They also stop before caches have stopped. * Added namespace limiters to all client.List() calls in pgbackrest and volumes files in the controller. Changed List calls to consistently use ListOptions struct or individual ListOption arguments, but not a mixture of both. Issue: [sc-13871] Issue: [sc-16139] Issue: CrunchyData/postgres-operator#3058 Issue: CrunchyData/postgres-operator#3364 * updated urls from github.io to the access portal ensuring users are looking at the latest documentation Issue: [sc-16478] * Move environment logging into main() * controller-runtime Source that emits a constant Event periodically * Single-method implementations of controller-runtime Client * Bridge API client Issue: [sc-16285] * Bridge installation reconciler Issue: [sc-16285] * Use optimistic concurrency and log retries The Kubernetes clients provided by controller-runtime Manager fetch from a cache. When fetching then writing back a single object, one should use the object's resourceVersion to avoid races and lost updates. Issue: [sc-16285] See: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency * Hide the progress bar when calling curl in tests * Migration assistance (#3445) * Log errors when the PostgreSQL data directory is wrong The postgres-startup container now reports when it finds the installed PostgreSQL binaries do not match the specified PostgreSQL version. Some storage providers do not mount the PostgreSQL data volume with correct ownership or permissions. The postgres-startup container now prints those attributes of parent directories when it cannot create or modify a needed file or directory. Issue: [sc-11804] Issue: CrunchyData/postgres-operator#2870 Co-authored-by: @cbandy * Change owner of the PostgreSQL directory at startup PostgreSQL won't to start unless it owns the data directory. Kubernetes sets the group according to fsGroup but not the owner. The postgres-startup container now recreates the data directory to give it a new owner when permissions are sufficient to do so. It now raises an error when the owner is incorrect and cannot be changed. Issue: [sc-15909] See: https://docs.k8s.io/tasks/configure-pod-container/security-context/ Co-authored-by: @cbandy * Add KUTTL test for migration from third-party PGSQL Issue: [sc-15909] * Add concurrencyPolicy to backup CronJobs Only one pgBackRest backup can run at a time. A scheduled backup that runs too long can cause the next scheduled backup to fail and retry multiple times. Skip that next one instead. Co-authored-by: Scott Zelenka <szelenka@cisco.com> Issue: CrunchyData/postgres-operator#3439 * Require SCRAM authentication of the monitoring user The PostgreSQL STIG requires that password authentication be done using scram-sha-256. Co-authored-by: Scott Zelenka <szelenka@cisco.com> Issue: CrunchyData/postgres-operator#3424 See: https://www.stigviewer.com/stig/crunchy_data_postgresql/2022-06-13/finding/V-233519 * Limit the monitoring user to local connections Issue: [sc-12218] * Remove disable exporter tls test Checking that tls has been disabled on a cluster (where it was previously enabled) is difficult. This is because we need to wait for the instance pod to be redeployed without tls configuration. We are removing case from the kuttl test with plans to ensure we have the same coverage in go tests in the future. Issue: [sc-16572] * Pin GitHub actions to Ubuntu 20.04 The Ubuntu 22.04 runners include ShellCheck v0.8 which has new rules. Issue: [sc-13394] * Added a warning noticed ot the pgadmin 4 architecture docs to let users know there are compatibility issues with pgAdmin 4 and pg15 Issue: [sc-16516] * Adding uniqueness to cluster names when testing service type changes to work around race condition that is causing these tests to flake. [sc-16571] * Moving PG Major Upgrades API to postgres-operator repo. [SC-16347] * Add PGUpgrades to the controller-gen TODO hack Issue: [sc-16347] * Do not configure JIT for the monitoring user PostgreSQL 10 does not have a "jit" parameter. The current release of pgMonitor includes this fix and correctly applies it to specific versions of PostgreSQL. This partially reverts commit df492f1. Issue: [sc-15755] See: CrunchyData/pgmonitor#295 * Update security context kuttl test for OCP 4.11 Adjusts the SCC check to support the 'restricted-v2' SCC in addition to the 'restricted' SCC. * Make the TTL of pgBackRest backups configurable The default retention of one failed backup Job can leave a Job and its Pods in a failed state indefinitely. The TTL setting lets someone choose how long they want Jobs, Pods, and their logs to be available. This field is functional in Kubernetes 1.21 and OpenShift 4.8 where the TTLAfterFinished feature gate is enabled by default. Issue: [sc-14014] Issue: CrunchyData/postgres-operator#3444 * Bumping pgMonitor to v4.8.0. [SC-16701] * Update Version 5.2.0 to 5.3.0 Update PGO and Postgres versions for 5.3.0. Issue: [sc-16943] * Add Postgres 15 RELATED_IMAGE environment variable This commit adds the Postgres 15 RELATED_IMAGE environment variable to manager.yaml Issue: [sc-16943] * Add entries to bundle.relatedImages.yaml Add entries for Postgres 15, Postgres 14 with GIS 3.3 and Postgres 15 with GIS 3.3 images to the bundle.relatedImages.yaml file. Issue: [sc-16943] * Update the minimum Kubernetes and OCP OLM versions PGO 5.3.0 will support, per the documentation, Kubernetes 1.22-1.25 and OpenShift 4.8-4.11. However, the OLM bundle minKubeVersion must match the minimum OCP's included Kubernetes version, which is 1.21 per https://access.redhat.com/solutions/4870701. Therefore, this commit sets 'com.redhat.openshift.versions' to v4.8 and 'minKubeVersion' to 1.21.0 for our OLM bundle generation. Issue: [sc-16943] * Helm OCI Release Notes Issue: [sc-16943] * Add docs for helm oci (#3493) Co-authored-by: Chris Bandy <bandy.chris@gmail.com> Issue: [sc-16938] Co-authored-by: Chris Bandy <bandy.chris@gmail.com> * Update Postgres version 15.0 to 15.1 * Update comment for Metadata (#3496) Metadata is used by postgrescluster and pgupgrade * pgMonitor v4.8.0 Release Note Issue: [sc-16943] * Bump Build Number for PG 14 PostGIS 3.3 * Fix Typo for CLI in Release Notes * Update the default Postgers image used for Kuttl tests * Document Postgres 15 recovery_target_action behavior Postgres 15 behaves the same as Postgres 14 in this regard. * Remove the note about language in the pgBackRest docs The pgBackRest documentation seems clear enough to me now. * Integrating Major PG Upgrades controller logic and testing into PGO. [sc-16348] Co-authored-by: Tony Landreth <anthony.w.landreth@gmail.com> * Set operator image tag to release v5.4.0 After pulling major-upgrades into postgres-operator, a new image will be needed to install a fully functional operator. This commit bumps the tag on the operator image to the presently unreleased v5.4.0. Issue: [sc-16349] * Adds KUTTL_PG_UPGRADE_TO_VERSION parameter A new parameter is added to decouple settings between operator tests and upgrade tests. Issue: [sc-17416] * Update README.md Fix installation, otherwise it is not working. * Bumping min OCP version (#3509) * Pin checks to Kube 1.25 * Simplify Makefile A help target has been added that describes each target and groups them by category. Remove targets to push/pull images from gcr - now that we only have two images in this repo manually running the podman commands will be fine Remove option to push to docker daemon or build with sudo - with buildah and podman we don't typically need these options Update build targets - we had some logic in our image and binary build targets that was overly complicated now that we only have two images in this repo. Each binary and image has a single target used to build that particular resource. The names of these targets have been updated to improve readability. Random cleanup - Add phony targets - Remove relics of the past - remove images var that is now unused * Simplify postgres-operator dockerfiles This change simplifies the dockerfiles used to build our postgres-operator and crunchy-postgres-exporter images. We remove the concept of a base image and put all required layers in its own image. The postgres-operator image is now build from ubi8-micro and the exporter image is built using ubi8-micro. Remove setup scripts used to gather pgmonitor resources. This logic has been moved to the make get-pgmonitor and get-postgres-exporter targets * Add a GeoJSON assertion to the PostGIS Kuttl test Issue: [sc-13236] * Update PGO upgrade docs When upgrading to v5.4, Kustomize installations will require deletion of the pgo-upgrade deployment. Issue: [sc-16349] * Update Copyright notices for 2023 * Add trivy action to catch CVEs (#3544) Note: cron is set for testing purposes at the moment Issue: [sc-17241] * New generic function to dereference a non-nil pointer * Stop using the k8s.io/utils module directly The few functions we used were already available in an internal package. * Ensure go.mod is tidy during pull request checks We imported the "k8s.io/utils" module directly a few commits ago but neglected to update the "go.mod" file. * Update go.mod to avoid CVEs (#3548) Issue: [sc-17241] * Remove backup assertions from exporter test This test is not interested in backups and completes faster without those assertions. Issue: [sc-17016] * Remove backup assertions from streaming standby test This test completes faster without those assertions. Issue: [sc-17016] * Correct the comments on CodeQL actions CodeQL has changed and our Make targets have changed. * Update OLM bundle generation logic for postgres major upgrade This updates the OLM bundle generation logic to allow for the inclusion of the 'postgres-operator-upgrade' controller, the 'crunchy-upgrade' image and related PGUpgrade CRD and functionality. Related examples and documentation have been updated and all current images are included as required. Issue: [sc-17486] * updated pgaudit extension upgrade directions Issue: [sc-17351] * Update docs/content/guides/major-postgres-version-upgrade.md Co-authored-by: tjmoore4 <42497036+tjmoore4@users.noreply.github.com> * Update docs/content/guides/major-postgres-version-upgrade.md Co-authored-by: tjmoore4 <42497036+tjmoore4@users.noreply.github.com> * Update docs/content/guides/major-postgres-version-upgrade.md Co-authored-by: tjmoore4 <42497036+tjmoore4@users.noreply.github.com> * updated paragraph for clarity and grammar mistakes * Update docs/content/guides/major-postgres-version-upgrade.md Co-authored-by: tjmoore4 <42497036+tjmoore4@users.noreply.github.com> * Adding GitHub Actions Job for E2E testing. Refactoring kubernetes-k3d Job to use new K3d action for setting up k3d. Adjusting root-cert-ownership kuttl test to work with POSIX shell used in Github Actions. [sc-17404] * Adjusting create-kubeconfig.sh script to avoid race condition where the service-account-token secret was created, but the .data.token has not yet been populated. * Fix tests to work on macOS Ventura Shell utilities included in Ventura do not behave the same as GNU core utilities, and OpenSSL has been replaced with LibreSSL. * Updated go.mod Issue [sc-17837] * PGO will now turn "huge_pages" to "try" or "off" based on whether huge pages have been requested in the resource spec. [sc-17766] * Update docs/content/guides/huge-pages.md Co-authored-by: Tony Landreth <56887169+tony-landreth@users.noreply.github.com> * Update standby configuration documentation Update the docs to better reflect required value types in tutorial documentation. Issue: [sc-17928] * Bump github.com/onsi/ginkgo to v2 Recent versions of "sigs.k8s.io/controller-runtime" have switched to "github.com/onsi/ginkgo/v2" and dropped the "sigs.k8s.io/controller-runtime/pkg/envtest/printer" package. This change to tests should make updating controller-runtime easier in the future. * Update k3d and k3s URLs Things have moved away from the Rancher domain and organization. The URLs we were using redirect to these. * Add tablespace alpha functionality (#3575) * Adds the tablespaceVolumes field to the CRD; * Adds basic tablespace functionality: mounts the volumes and preps them with correct permissions; * Adds option for restoring with tablespaces (needs more testing); * Adds docs/content/guides/tablespaces * Adds a basic KUTTL test for creating a cluster with tablespaces; * Updates the github test to add the feature gate Issue: [sc-17759] * Regularize kubebuilder RBAC annotations (#3586) * Improvements to feature gate handling (#3599) a) improve deploy-dev to allow user to easily set b) print feature gates on startup * Update docs (#3604) Issue: [sc-18286] * Breaks out trivy-scheduled-scans Runs scheduled Trivy scans on the main and REL_4_7 branches. Issue: [sc-17407] * Removed Postgres 13 from RELATED_IMAGES. Now that we've had 2 patch releases of Postgres 15 we are dropping postgres 13. Issue [sc-17907] * Updated the github actions works flow with latest container images * changed kuttl pg version back to pg 14 * Fix e2e-other/postgis-cluster KUTTL (#3628) Problem: PostGIS < v3.1 had trouble parsing result from ST_AsGeoJSON with ST_AsText function. Solution: Remove ST_AsText and check JSON directly Issue: [sc-18159] * Updated images to the latest versions and updated to postgres 15 Issue [sc-17991] * Update examples/postgrescluster/postgrescluster.yaml Co-authored-by: Chris Bandy <bandy.chris@gmail.com> * Clarifications to docs about restoring individual databases, plus additional links to CRD and cross-linking to improve readbility * Changed Individual Databases paragraph into a warning, as per Andrew's suggestion. * Add extra comma, as per bblattberg * Refactor looping tests Instead of setting an amount of time that these loops are allowed to run, we can use an infinite loop that will fail when Kuttl hits its timeout. Issue [sc-18801] * Clarify custom tls documentation (#3629) * Add documentation about custom TLS secrets, clarifying replication secret common name * Bump streaming standby test secrets to have 10y expiration Issue: [sc-14645] * document that wal files are not deleted * typo in pgdata path * more verbose wording Co-authored-by: Drew Sessler <36803518+dsessler7@users.noreply.github.com> * Change buildah for new build process (#3646) Issue: [sc-19532] Issue: [sc-18718] * Update kustomization: patches (#3658) * Update kustomization.yaml (#3655) Update kyverno URI * Update component page info Issue: [sc-16032] * Updating Keycloak example documentation * Add warning blocks to hugepages doc. [sc-18155] * Renew Bridge installations Issue: [sc-16285] * Update exporter release target to build exporter * Revamp demoting active to standby (#3661) Issue: [sc-20085] * Update depguard configuration for golangci-lint v1.53 The depguard v2 linter allows different rules to be applied to different sets of files. See: golangci/golangci-lint#3795 See: https://github.com/OpenPeeDeeP/depguard#config See: https://golangci-lint.run/usage/linters/#depguard * Update HA Architecture Doc Revises the High Availability Algorithm section to bring it into alignment with our current configuration. Issue: [sc-20086] * adding Postgres primary & replica cert to Secret * Adding fix for hugepages/restore issue. [sc-20758] * Revise pgbouncer kuttl test to debug (#3683) Issue: [sc-21015] Co-authored-by: Chris Bandy <bandy.chris@gmail.com> * Fix README Links * Latest updates * Remove redundant trivy scans (#3695) * Update test workflow Issue: [sc-20728] * Refactor Delete Namespace test - Allow runner to define a namespace to delete through the makefile. This will be useful if two sets of kuttl tests are running in the same env - Move from 2 replicas to 1 to speed up the test - Use single line volume claim specs * Update Postgres Exporter version to 0.12.1 PGO-42 * Stop PostgresCluster reconciliation when required image not set This update prevents empty image values from impacting the reconciliation of a PostgresCluster. With this change, the impacted cluster will not be updated until the necessary images are defined and a corresponding warning event will be created. PostgresClusters with images properly defined will reconcile normally. Issue: [sc-21130] * Update the PGUpgrade logic for missing image scenario Adjusts the PGUpgrade logic to allow for easier recovery from a missing image scenario. Specific Conditions are more clearly defined and checking is added for the 'crunchy-upgrade' image. A Kuttl test scenario is also added. Issue: [sc-21130] * Latest updates * Adjust major upgrade kuttl tests Move major-upgrade-missing-image test to e2e-other and create shorter version, empty-image-upgrade. * Add Discord Info to README * Update Invite Code * Quiet issues detected by golangci-lint v1.54.2 gosec v2.17.0 detects more cases of pointers to loop variables. * Update apply_test to handle changes for Kubernetes 1.28+ Prior to 1.28.0, certain no-op server-side apply updates bumped the resourceVersion value. For new Kubernetes versions this behavior has been adjusted so that resourceVersion is not bumped. This change adds an additional check for the server version to allow the correct test to be executed. * Remove kubectl '--short' flag from Github actions The 'short' flag is now deprecated. The default output for kubectl is now equivalent to the previous shortened output. - https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md#deprecation * Exporter refactor. Remove all of crunchy-postgres-exporter from this repo. Refactor postgres-operator to hold the setup.sql and queries.yml files used by the postgres_exporter. Add logic to postgres-operator to replace the functionality that was in the start.sh script that will be removed from the exporter image. Adjust testing accordingly. * Version updates * Force `InstanceSidecar` feature gate to be enabled * fix configs --------- Signed-off-by: Kirill Petrov <chobostar85@gmail.com> Co-authored-by: Benjamin Blattberg <ben.blattberg@crunchydata.com> Co-authored-by: TJ Moore <tj.moore@crunchydata.com> Co-authored-by: Chris Bandy <chris.bandy@crunchydata.com> Co-authored-by: Andrew L'Ecuyer <andrew.lecuyer@crunchydata.com> Co-authored-by: jmckulk <joseph.mckulka@crunchydata.com> Co-authored-by: Val <ValClarkson@users.noreply.github.com> Co-authored-by: atorik <atorik@gmail.com> Co-authored-by: Chris Bandy <bandy.chris@gmail.com> Co-authored-by: Benjamin Blattberg <benjamin.blattberg@gmail.com> Co-authored-by: tjmoore4 <42497036+tjmoore4@users.noreply.github.com> Co-authored-by: Brandon Avant <avant.brandon@gmail.com> Co-authored-by: Andy Li <andy@onthewings.net> Co-authored-by: ValClarkson <valerie0149@gmail.com> Co-authored-by: Shinya Kato <u361141e@gmail.com> Co-authored-by: Kirill Petrov <chobostar85@gmail.com> Co-authored-by: Jelmer Vernooij <jelmer@jelmer.uk> Co-authored-by: ValClarkson <valerie.clarkson@crunchydata.com> Co-authored-by: Jeff Martin <jam263@gmail.com> Co-authored-by: Jeff Martin <jeff.martin@previ.com> Co-authored-by: Drew Sessler <drew.sessler@crunchydata.com> Co-authored-by: Drew Sessler <36803518+dsessler7@users.noreply.github.com> Co-authored-by: szelenka <szelenka@gmail.com> Co-authored-by: Scott Zelenka <szelenka@cisco.com> Co-authored-by: Tony Landreth <anthony.w.landreth@gmail.com> Co-authored-by: Sergey Pronin <spron-in@users.noreply.github.com> Co-authored-by: David Youatt <david.youatt@crunchydata.com> Co-authored-by: Tony Landreth <56887169+tony-landreth@users.noreply.github.com> Co-authored-by: Roberto Mello <roberto.mello@gmail.com> Co-authored-by: Stefan Midjich <swehack@gmail.com> Co-authored-by: Stefan Midjich <stemid@users.noreply.github.com> Co-authored-by: David Jeffers <david@dajeffers.com> Co-authored-by: Anthony Landreth <tony.landreth@crunchydata.com>
* Update internal/pgbackrest/config.go Co-authored-by: Chris Bandy <bandy.chris@gmail.com> * Changed code to use strings.EqualFold() for case-insensitive comparison. * Update pgBackRest repo option logic When taking a backup, PGO tries to help by not allowing the user to pass the "--repo" option. However, the current method for catching this results in catching any option that begins with "--repo", which prevents users from passing in perfectly valid options. This commit corrects the flag check to only block on exact matches of "--repo". Issue: [sc-16128] * Bumping kubebuilder:validation:Maximum for major PostgresVersion to 15. * Add constants for services registered with the IANA The PostgreSQL and pgBackRest protocols are both registered with the IANA according to RFC 6335. See: https://www.iana.org/assignments/service-names-port-numbers * Get primary name after waiting for redeploy * Update kuttl tests for Postgres 15 public schema updates With Postgres 15, the removal of PUBLIC creation permisson on the public schema requires updates to our kuttl test logic. This commit allows the tests to perform as expected with these new changes by creating/referencing new schemas as needed. Note that these changes should not impact Postgres versions < 15. Issue: [sc-16289] * Alter make generate-kuttl to quiet output (#3442) * Pass the upgrade-check URL as an argument The global value is now a constant and somewhat easier to reason about. * Handle upgrade-check panics in a single place * Start and stop upgrade-check using controller-runtime Blocking functions can be added to a controller-runtime Manager so that they start after caches have started and synced. They also stop before caches have stopped. * Added namespace limiters to all client.List() calls in pgbackrest and volumes files in the controller. Changed List calls to consistently use ListOptions struct or individual ListOption arguments, but not a mixture of both. Issue: [sc-13871] Issue: [sc-16139] Issue: CrunchyData/postgres-operator#3058 Issue: CrunchyData/postgres-operator#3364 * updated urls from github.io to the access portal ensuring users are looking at the latest documentation Issue: [sc-16478] * Move environment logging into main() * controller-runtime Source that emits a constant Event periodically * Single-method implementations of controller-runtime Client * Bridge API client Issue: [sc-16285] * Bridge installation reconciler Issue: [sc-16285] * Use optimistic concurrency and log retries The Kubernetes clients provided by controller-runtime Manager fetch from a cache. When fetching then writing back a single object, one should use the object's resourceVersion to avoid races and lost updates. Issue: [sc-16285] See: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency * Hide the progress bar when calling curl in tests * Migration assistance (#3445) * Log errors when the PostgreSQL data directory is wrong The postgres-startup container now reports when it finds the installed PostgreSQL binaries do not match the specified PostgreSQL version. Some storage providers do not mount the PostgreSQL data volume with correct ownership or permissions. The postgres-startup container now prints those attributes of parent directories when it cannot create or modify a needed file or directory. Issue: [sc-11804] Issue: CrunchyData/postgres-operator#2870 Co-authored-by: @cbandy * Change owner of the PostgreSQL directory at startup PostgreSQL won't to start unless it owns the data directory. Kubernetes sets the group according to fsGroup but not the owner. The postgres-startup container now recreates the data directory to give it a new owner when permissions are sufficient to do so. It now raises an error when the owner is incorrect and cannot be changed. Issue: [sc-15909] See: https://docs.k8s.io/tasks/configure-pod-container/security-context/ Co-authored-by: @cbandy * Add KUTTL test for migration from third-party PGSQL Issue: [sc-15909] * Add concurrencyPolicy to backup CronJobs Only one pgBackRest backup can run at a time. A scheduled backup that runs too long can cause the next scheduled backup to fail and retry multiple times. Skip that next one instead. Co-authored-by: Scott Zelenka <szelenka@cisco.com> Issue: CrunchyData/postgres-operator#3439 * Require SCRAM authentication of the monitoring user The PostgreSQL STIG requires that password authentication be done using scram-sha-256. Co-authored-by: Scott Zelenka <szelenka@cisco.com> Issue: CrunchyData/postgres-operator#3424 See: https://www.stigviewer.com/stig/crunchy_data_postgresql/2022-06-13/finding/V-233519 * Limit the monitoring user to local connections Issue: [sc-12218] * Remove disable exporter tls test Checking that tls has been disabled on a cluster (where it was previously enabled) is difficult. This is because we need to wait for the instance pod to be redeployed without tls configuration. We are removing case from the kuttl test with plans to ensure we have the same coverage in go tests in the future. Issue: [sc-16572] * Pin GitHub actions to Ubuntu 20.04 The Ubuntu 22.04 runners include ShellCheck v0.8 which has new rules. Issue: [sc-13394] * Added a warning noticed ot the pgadmin 4 architecture docs to let users know there are compatibility issues with pgAdmin 4 and pg15 Issue: [sc-16516] * Adding uniqueness to cluster names when testing service type changes to work around race condition that is causing these tests to flake. [sc-16571] * Moving PG Major Upgrades API to postgres-operator repo. [SC-16347] * Add PGUpgrades to the controller-gen TODO hack Issue: [sc-16347] * Do not configure JIT for the monitoring user PostgreSQL 10 does not have a "jit" parameter. The current release of pgMonitor includes this fix and correctly applies it to specific versions of PostgreSQL. This partially reverts commit df492f1. Issue: [sc-15755] See: CrunchyData/pgmonitor#295 * Update security context kuttl test for OCP 4.11 Adjusts the SCC check to support the 'restricted-v2' SCC in addition to the 'restricted' SCC. * Make the TTL of pgBackRest backups configurable The default retention of one failed backup Job can leave a Job and its Pods in a failed state indefinitely. The TTL setting lets someone choose how long they want Jobs, Pods, and their logs to be available. This field is functional in Kubernetes 1.21 and OpenShift 4.8 where the TTLAfterFinished feature gate is enabled by default. Issue: [sc-14014] Issue: CrunchyData/postgres-operator#3444 * Bumping pgMonitor to v4.8.0. [SC-16701] * Update Version 5.2.0 to 5.3.0 Update PGO and Postgres versions for 5.3.0. Issue: [sc-16943] * Add Postgres 15 RELATED_IMAGE environment variable This commit adds the Postgres 15 RELATED_IMAGE environment variable to manager.yaml Issue: [sc-16943] * Add entries to bundle.relatedImages.yaml Add entries for Postgres 15, Postgres 14 with GIS 3.3 and Postgres 15 with GIS 3.3 images to the bundle.relatedImages.yaml file. Issue: [sc-16943] * Update the minimum Kubernetes and OCP OLM versions PGO 5.3.0 will support, per the documentation, Kubernetes 1.22-1.25 and OpenShift 4.8-4.11. However, the OLM bundle minKubeVersion must match the minimum OCP's included Kubernetes version, which is 1.21 per https://access.redhat.com/solutions/4870701. Therefore, this commit sets 'com.redhat.openshift.versions' to v4.8 and 'minKubeVersion' to 1.21.0 for our OLM bundle generation. Issue: [sc-16943] * Helm OCI Release Notes Issue: [sc-16943] * Add docs for helm oci (#3493) Co-authored-by: Chris Bandy <bandy.chris@gmail.com> Issue: [sc-16938] Co-authored-by: Chris Bandy <bandy.chris@gmail.com> * Update Postgres version 15.0 to 15.1 * Update comment for Metadata (#3496) Metadata is used by postgrescluster and pgupgrade * pgMonitor v4.8.0 Release Note Issue: [sc-16943] * Bump Build Number for PG 14 PostGIS 3.3 * Fix Typo for CLI in Release Notes * Update the default Postgers image used for Kuttl tests * Document Postgres 15 recovery_target_action behavior Postgres 15 behaves the same as Postgres 14 in this regard. * Remove the note about language in the pgBackRest docs The pgBackRest documentation seems clear enough to me now. * Integrating Major PG Upgrades controller logic and testing into PGO. [sc-16348] Co-authored-by: Tony Landreth <anthony.w.landreth@gmail.com> * Set operator image tag to release v5.4.0 After pulling major-upgrades into postgres-operator, a new image will be needed to install a fully functional operator. This commit bumps the tag on the operator image to the presently unreleased v5.4.0. Issue: [sc-16349] * Adds KUTTL_PG_UPGRADE_TO_VERSION parameter A new parameter is added to decouple settings between operator tests and upgrade tests. Issue: [sc-17416] * Update README.md Fix installation, otherwise it is not working. * Bumping min OCP version (#3509) * Pin checks to Kube 1.25 * Simplify Makefile A help target has been added that describes each target and groups them by category. Remove targets to push/pull images from gcr - now that we only have two images in this repo manually running the podman commands will be fine Remove option to push to docker daemon or build with sudo - with buildah and podman we don't typically need these options Update build targets - we had some logic in our image and binary build targets that was overly complicated now that we only have two images in this repo. Each binary and image has a single target used to build that particular resource. The names of these targets have been updated to improve readability. Random cleanup - Add phony targets - Remove relics of the past - remove images var that is now unused * Simplify postgres-operator dockerfiles This change simplifies the dockerfiles used to build our postgres-operator and crunchy-postgres-exporter images. We remove the concept of a base image and put all required layers in its own image. The postgres-operator image is now build from ubi8-micro and the exporter image is built using ubi8-micro. Remove setup scripts used to gather pgmonitor resources. This logic has been moved to the make get-pgmonitor and get-postgres-exporter targets * Add a GeoJSON assertion to the PostGIS Kuttl test Issue: [sc-13236] * Update PGO upgrade docs When upgrading to v5.4, Kustomize installations will require deletion of the pgo-upgrade deployment. Issue: [sc-16349] * Update Copyright notices for 2023 * Add trivy action to catch CVEs (#3544) Note: cron is set for testing purposes at the moment Issue: [sc-17241] * New generic function to dereference a non-nil pointer * Stop using the k8s.io/utils module directly The few functions we used were already available in an internal package. * Ensure go.mod is tidy during pull request checks We imported the "k8s.io/utils" module directly a few commits ago but neglected to update the "go.mod" file. * Update go.mod to avoid CVEs (#3548) Issue: [sc-17241] * Remove backup assertions from exporter test This test is not interested in backups and completes faster without those assertions. Issue: [sc-17016] * Remove backup assertions from streaming standby test This test completes faster without those assertions. Issue: [sc-17016] * Correct the comments on CodeQL actions CodeQL has changed and our Make targets have changed. * Update OLM bundle generation logic for postgres major upgrade This updates the OLM bundle generation logic to allow for the inclusion of the 'postgres-operator-upgrade' controller, the 'crunchy-upgrade' image and related PGUpgrade CRD and functionality. Related examples and documentation have been updated and all current images are included as required. Issue: [sc-17486] * updated pgaudit extension upgrade directions Issue: [sc-17351] * Update docs/content/guides/major-postgres-version-upgrade.md Co-authored-by: tjmoore4 <42497036+tjmoore4@users.noreply.github.com> * Update docs/content/guides/major-postgres-version-upgrade.md Co-authored-by: tjmoore4 <42497036+tjmoore4@users.noreply.github.com> * Update docs/content/guides/major-postgres-version-upgrade.md Co-authored-by: tjmoore4 <42497036+tjmoore4@users.noreply.github.com> * updated paragraph for clarity and grammar mistakes * Update docs/content/guides/major-postgres-version-upgrade.md Co-authored-by: tjmoore4 <42497036+tjmoore4@users.noreply.github.com> * Adding GitHub Actions Job for E2E testing. Refactoring kubernetes-k3d Job to use new K3d action for setting up k3d. Adjusting root-cert-ownership kuttl test to work with POSIX shell used in Github Actions. [sc-17404] * Adjusting create-kubeconfig.sh script to avoid race condition where the service-account-token secret was created, but the .data.token has not yet been populated. * Fix tests to work on macOS Ventura Shell utilities included in Ventura do not behave the same as GNU core utilities, and OpenSSL has been replaced with LibreSSL. * Updated go.mod Issue [sc-17837] * PGO will now turn "huge_pages" to "try" or "off" based on whether huge pages have been requested in the resource spec. [sc-17766] * Update docs/content/guides/huge-pages.md Co-authored-by: Tony Landreth <56887169+tony-landreth@users.noreply.github.com> * Update standby configuration documentation Update the docs to better reflect required value types in tutorial documentation. Issue: [sc-17928] * Bump github.com/onsi/ginkgo to v2 Recent versions of "sigs.k8s.io/controller-runtime" have switched to "github.com/onsi/ginkgo/v2" and dropped the "sigs.k8s.io/controller-runtime/pkg/envtest/printer" package. This change to tests should make updating controller-runtime easier in the future. * Update k3d and k3s URLs Things have moved away from the Rancher domain and organization. The URLs we were using redirect to these. * Add tablespace alpha functionality (#3575) * Adds the tablespaceVolumes field to the CRD; * Adds basic tablespace functionality: mounts the volumes and preps them with correct permissions; * Adds option for restoring with tablespaces (needs more testing); * Adds docs/content/guides/tablespaces * Adds a basic KUTTL test for creating a cluster with tablespaces; * Updates the github test to add the feature gate Issue: [sc-17759] * Regularize kubebuilder RBAC annotations (#3586) * Improvements to feature gate handling (#3599) a) improve deploy-dev to allow user to easily set b) print feature gates on startup * Update docs (#3604) Issue: [sc-18286] * Breaks out trivy-scheduled-scans Runs scheduled Trivy scans on the main and REL_4_7 branches. Issue: [sc-17407] * Removed Postgres 13 from RELATED_IMAGES. Now that we've had 2 patch releases of Postgres 15 we are dropping postgres 13. Issue [sc-17907] * Updated the github actions works flow with latest container images * changed kuttl pg version back to pg 14 * Fix e2e-other/postgis-cluster KUTTL (#3628) Problem: PostGIS < v3.1 had trouble parsing result from ST_AsGeoJSON with ST_AsText function. Solution: Remove ST_AsText and check JSON directly Issue: [sc-18159] * Updated images to the latest versions and updated to postgres 15 Issue [sc-17991] * Update examples/postgrescluster/postgrescluster.yaml Co-authored-by: Chris Bandy <bandy.chris@gmail.com> * Clarifications to docs about restoring individual databases, plus additional links to CRD and cross-linking to improve readbility * Changed Individual Databases paragraph into a warning, as per Andrew's suggestion. * Add extra comma, as per bblattberg * Refactor looping tests Instead of setting an amount of time that these loops are allowed to run, we can use an infinite loop that will fail when Kuttl hits its timeout. Issue [sc-18801] * Clarify custom tls documentation (#3629) * Add documentation about custom TLS secrets, clarifying replication secret common name * Bump streaming standby test secrets to have 10y expiration Issue: [sc-14645] * document that wal files are not deleted * typo in pgdata path * more verbose wording Co-authored-by: Drew Sessler <36803518+dsessler7@users.noreply.github.com> * Change buildah for new build process (#3646) Issue: [sc-19532] Issue: [sc-18718] * Update kustomization: patches (#3658) * Update kustomization.yaml (#3655) Update kyverno URI * Update component page info Issue: [sc-16032] * Updating Keycloak example documentation * Add warning blocks to hugepages doc. [sc-18155] * Renew Bridge installations Issue: [sc-16285] * Update exporter release target to build exporter * Revamp demoting active to standby (#3661) Issue: [sc-20085] * Update depguard configuration for golangci-lint v1.53 The depguard v2 linter allows different rules to be applied to different sets of files. See: golangci/golangci-lint#3795 See: https://github.com/OpenPeeDeeP/depguard#config See: https://golangci-lint.run/usage/linters/#depguard * Update HA Architecture Doc Revises the High Availability Algorithm section to bring it into alignment with our current configuration. Issue: [sc-20086] * adding Postgres primary & replica cert to Secret * Adding fix for hugepages/restore issue. [sc-20758] * Revise pgbouncer kuttl test to debug (#3683) Issue: [sc-21015] Co-authored-by: Chris Bandy <bandy.chris@gmail.com> * Fix README Links * Latest updates * Remove redundant trivy scans (#3695) * Update test workflow Issue: [sc-20728] * Refactor Delete Namespace test - Allow runner to define a namespace to delete through the makefile. This will be useful if two sets of kuttl tests are running in the same env - Move from 2 replicas to 1 to speed up the test - Use single line volume claim specs * Update Postgres Exporter version to 0.12.1 PGO-42 * Stop PostgresCluster reconciliation when required image not set This update prevents empty image values from impacting the reconciliation of a PostgresCluster. With this change, the impacted cluster will not be updated until the necessary images are defined and a corresponding warning event will be created. PostgresClusters with images properly defined will reconcile normally. Issue: [sc-21130] * Update the PGUpgrade logic for missing image scenario Adjusts the PGUpgrade logic to allow for easier recovery from a missing image scenario. Specific Conditions are more clearly defined and checking is added for the 'crunchy-upgrade' image. A Kuttl test scenario is also added. Issue: [sc-21130] * Latest updates * Adjust major upgrade kuttl tests Move major-upgrade-missing-image test to e2e-other and create shorter version, empty-image-upgrade. * Add Discord Info to README * Update Invite Code * Quiet issues detected by golangci-lint v1.54.2 gosec v2.17.0 detects more cases of pointers to loop variables. * Update apply_test to handle changes for Kubernetes 1.28+ Prior to 1.28.0, certain no-op server-side apply updates bumped the resourceVersion value. For new Kubernetes versions this behavior has been adjusted so that resourceVersion is not bumped. This change adds an additional check for the server version to allow the correct test to be executed. * Remove kubectl '--short' flag from Github actions The 'short' flag is now deprecated. The default output for kubectl is now equivalent to the previous shortened output. - https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md#deprecation * Exporter refactor. Remove all of crunchy-postgres-exporter from this repo. Refactor postgres-operator to hold the setup.sql and queries.yml files used by the postgres_exporter. Add logic to postgres-operator to replace the functionality that was in the start.sh script that will be removed from the exporter image. Adjust testing accordingly. * Version updates * This commit adds the ability to append custom queries to the default exporter queries. This ability is feature gated with the AppendCustomQueries flag. The exporter kuttl tests have been adjusted. Tests that need the feature gate turned on have been added to e2e-other. This commit also moves some things from the postgrescluster package to the pgmonitor package. * Update workflow Kubernetes test versions * Make PGO backwards compatible with older crunchy-postgres-exporter im… (#3728) * Make PGO backwards compatible with older crunchy-postgres-exporter images. * Github action kuttl tests are failing due to recent changes to PGO. Get pgmonitor files and set QUERIES_CONFIG_DIR. Co-authored-by: Chris Bandy <bandy.chris@gmail.com> --------- Co-authored-by: Drew Sessler <drew.sessler@crunchydata.com> Co-authored-by: Chris Bandy <bandy.chris@gmail.com> * Ability to set custom ccp_monitoring pass With this change users can update the <cluster>-monitoring secret with a password, in either the `stringData` or `data` secret fields, and remove the verifier to update the ccp_monitoring password in postgres. After this users will need to restart the exporter process by deleting the instance pods (a solution that doesn't require full pod restarts is coming). This change is to support monitoring for standby clusters. Before this a standby cluster would be created without having access to the ccp_monitoring user password that was replicated from the primary cluster. Test to ensure that the postgres_exporter can scrape postgres using a custom ccp_monitoring password. The tests will: 1. create a cluster with exporter enabled and ensure metrics can be collected 2. Update the password and restart the pod 3. ensure that metrics can still be collected with the new password Tests now require jq to run - Refactor existing exporter tests - Split out the tls and no-tls tests into separate directories. - Update the tests to check the containers ready conditions - Add collectors for test failures - Include a test where we deploy a postgres-cluster with monitoring enabled on a replica. It will then check that the exporter on the replica can hit the query the database - Update exporter to use pass file - The exporter container now provides the ccp_monitoring password to postgres_exporter using a password file instead of an environment variable. With this, the password can be updated without requiring a container restart. The path to the password file has also been added to the exporter watcher logic meaning that the postgres_exporter process will be restarted when either the queries directory or password file change. - The password change test is updated to check that the postgres_exporter pid has been updated before trying to re-connect. - Update pgmonitor 4.9 - Update to pull pgMonitor 4.9 queries. The new version has a specific file for the global_dbsize metric that needs to be included when generating the default queries - Standby metrics testing - Now that the password for the monitoring user is configurable, users can configure a standby cluster to allow the exporter to query postgres using the ccp_monitoring user. This change implements testing to validate this use case. This test is included in e2e-other because it requires more work. We need to ensure a backup is complete before attempting to curl metrics. See note below* Note: Move standby and replica tests to e2e-other These two test can fail because of a scrape_error if a backup has not completed. They need to be updated to check that a backup is complete before attempting to collect metrics. There is a related story in our backlog. Due to the race condition, backup not being complete, they could pass or fail. After a backup chack is in place they should be able to move back into the e2e directory. * Exporter tests overwrite annotation As part of testing, kuttl will add annotations to instances pods before and after a change. Kuttl tests will continue to loop through the script until various conditions are met. This means that the annotation may be run more than once. If this happens, kubectl will complain that the annotation is already set and needs to be overwritten. This change adds the overwrite flag to kubectl annotate commands. * Replace most empty interface with "any" The "any" type has been available since Go 1.18 and is easier to read. * Add some badges to the README Co-authored-by: Greg Nokes <greg@nokes.name> * Change release announcements from mailing list to discord Co-authored-by: Greg Nokes <greg@nokes.name> * Initial scaffolding for standalone pgAdmin implementation Includes basic implementation for CRD, installation, and a dummy reconciliation loop. Reconciliation is feature-gated. * add initial changes for PG16 compatibility and pgMonitor 4.10.0 bump * Create and use a schema in kuttl tests to work around the change to CREATE permissions in public schema in pg15. * Reconcile a pgAdmin StatefulSet, Pod PVC and ConfigMap Add the reconciliation logic for the main initial elements for pgAdmin. Includes initial configuration options for the StatefulSet and example implementations for the ConfigMap, PVC and Status block * Update PostgresCluster example to RWO Volumes * pgAdmin use and discovery (#3739) * Reconcile admin secret * Working pgadmin with new image * Cluster config reconcile and load * add KUTTL test, fix item sorting * PR feedback and adding comments (#3741) * Remove admin as user-settable field (#3742) * Normalize our kustomize files and get rid of deprecation warnings. * Remove pgAdmin feature gate PGO-558 * Update OLM installer bundles for pgAdmin API updates This commit adds a new OLM example for the pgAdmin CRD and updates the bundle description file. * Update pgAdmin RBAC to match feature requirements Adjust kubebuilder markers and regenerate RBAC for current pgAdmin feature. PGO-565 * add pgadmin config (#3747) * remove unused kerberos env vars for now * add init container to write script to read from mounted config file Issue: PGO-547 * update service name (#3753) * update kuttl admin username * Remove pgAdmin Service The option to allow a user to create a service through the pgAdmin spec was pulled over from the v4.30 implentation. We are reevaluating how we want to handle services and have decided to remove service creation entirely for this round of pgAdmin changes. * Update naming to use pgAdmin CR instance UID Update all relevant names and impacted tests. Issue: PGO-591 * add username to pgadmin secret (#3757) * Grammar fix for README.md Issue: PGO-550 * Defines registration and encumbrance PGO-217 * Adds registration to unencumber PGO-431 * When requested, load the "citus" library first Issue: PGO-284 Issue: CrunchyData/postgres-operator#3194 * update versions * updated gis version * Add semantic versioning for registration. * updated github workflow Issue: PGO-353 * Update SSA expectations for recent versions of Kubernetes The fix for https://issue.k8s.io/116861 was backported a few times. * Relax an expectation in the Ticker tests The test is racy and occasionally fails with "4 (int) != 3 (int)". * update dependencies * Set SecurityContext for standalone pgAdmin to avoid filesystem permission issues. * Add DEFAULT_BINARY_PATH to standalone pgadmin startup script. * Simplify exporter tests using MarshalMatches Issue: PGO-635 * Move permanent flags into ExporterStartCommand Issue: PGO-635 * Move exporter TLS logic into generate method Issue: PGO-635 * Allow disabling of postgres_exporter defaults postgres_exporter provides default metrics, settings, and collectors. This change creates an annotation to allow disabling all of the postgres_exporter defaults. Co-authored-by: jmckulk <joseph.mckulka@crunchydata.com> Issue: PGO-635 * Update default upgrade test versions * Update olm bundling (#3786) Update olm bundling Issue: PGO-430 * Update Versions * Add Parallel Pod Management Policy to standalone pgadmin StatefulSet. * Add Parallel Pod Management Policy and RollingUpdate UpdateStrategy to repohost StatefulSet. This allows the Pod to recover from a bad rollout. * Add Parallel Pod Management Policy to postgrescluster-scoped pgadmin StatefulSet. This allows the Pod to recover from a bad rollout. * Upgrade opm version Issue: PGO-429 * Update OLM bundle description Issue: PGO-728 * Remove Docs * Reduces line length < 180 chars The RH Certified bundle GitHub pipeline requires lines of yaml to be less than 180 characters. Issue: PGO-728 * updated makefile and linter for doc deletions * Add Discord link to intro section of readme. * Update restore configuration file behavior Currently, the restore Pod will load both the pgBackRest configuration objects from `spec.datasource` and `spec.backups` sections. This commit updates that behavior so that only the `datasource.pgbackrest.configuration` is loaded when performing a cloud based restore. Issue: PGO-260 * Make standalone pgAdmin controller the owner of objects that it creates. * Update watchPods() behavior for Patroni role change Add an additional check to the existing watchPods function to queue an event when an instance Pod is first given the 'master' role. Issue: PGO-190 * Replace KUTTL 'empty-image-upgrade' with 'major-upgrade-missing-image' Now that the bug fix is in place, move 'major-upgrade-missing-image' back to the main testing folder in place of the subset test, 'empty-image-upgrade' Issue: PGO-190 * Use the queries collected in queries dir * Move standalone pgadmin test from e2e-other to e2e. * Add standalone pgadmin related image to github action test.yaml. * Update manager.yaml (#3816) * Tweak restart logic for exporter (#3817) * Tweak restart logic for exporter This separates out the kill/restart logic for the exporter; previously, if a file changed, we would kill/restart. This led to some test flakes where the restart would happen too quickly (hypothesis) before the port was free. This PR separates the logic: If the watched files change, kill the exporter; If no exporter is running, start the exporter. This also adds a check to the start_postgres_exporter func: save the PID to a file only if that proc is running. Issue: [PGO-420] * Update PGO configurations to support TDE This update builds on the exist custom configuration capabilities of PGO to allow a Postgres cluster to be configured to support of Transparent Data Encryption (TDE) in Postgres. Issue: PGO-779 * Revise cluster-pause/start tests (#3821) * Revise cluster-pause test * Use files for legibility * Add describe/log collectors to every assert * Change cluster-pause change from replica to service to speed up tests * pgadmin configuration keys can be alphanumeric see config.py docs * Additional TDE configuration for Patroni pg_rewind Provide a wrapper script to allow Patroni to invoke pg_rewind as required for TDE and update configuration to enable pg_rewind in Patroni for all versions > 10. Issue: PGO-785 * pgbackrest-restore KUTTL test namespace flag usage updates * Restores logo to README * Add additional configuration files to the restore Job Pod To support necessary TDE configurations, this commit adds any configured items, such as ConfigMaps or Secrets, from the `config` field to be mounted at `/etc/postgres` in the restore Job Pod's 'pgbackrest-restore' Container. Issue: PGO-909 * Bridge via PGO MVP Issue: [PGO-814] Co-authored-by: Benjamin Blattberg <benjamin.blattberg@gmail.com> * Update Copyright (#3818) Issue: [PGO-812] * Update pgMonitor version (#3837) * Update pgMonitor version Issue: [PGO-319] * Change name of bridge crd and files Issue: [PGO-915] * Update internal/bridge/crunchybridgecluster/crunchybridgecluster_controller.go Co-authored-by: Drew Sessler <36803518+dsessler7@users.noreply.github.com> * Update internal/bridge/crunchybridgecluster/crunchybridgecluster_controller.go Co-authored-by: Drew Sessler <36803518+dsessler7@users.noreply.github.com> * Bump golang.org/x/crypto to quiet an SSH CVE We use the PBKDF2 functions of this module for SCRAM verifiers. We do not use any of its SSH functionality. Issue: PGO-938 See: CVE-2023-48795 See: GHSA-45x7-px36-x8w8 * Bind pgAdmin to every IPv4 address by default The upstream default is "127.0.0.1", the IPv4 loopback address, which allows only local connections. Issue: CrunchyData/postgres-operator#3809 Issue: PGO-842 * Pull the trademarks forward * Add Crunchy Bridge Cluster adoption annotation logic. * Add params to bridge client methods. Add team_id to ListClusters params. * Cleanup EnvTest binaries more aggressively I had trouble running "make clean" while switching between Linux and macOS in the same working directory. * Pin the "controller-gen" workflow to Go 1.21 The tool panics in Go 1.22, and we don't want to bump to a compatible version just yet. * Pin test coverage workflows to Go 1.21 The Go 1.22 "go test" command fails when using the "-coverpkg" flag. See: https://go.dev/issue/65653 * Remove dependency licenses during the "clean" target While I was switching between old branches, these directories were left with Go code that breaks the "check" targets. * updated API fields to look kube native, and updated some API fields to names match the spec and status Issue: [PGO-910] * Updates to the Readme (#3855) * Updates to the Readme * Allow configuration of replica service through spec This change adds the ReplicaService field to the spec that gives users the same configuration options as other services (primary and pgbouncer). Notes: - go and kuttl tests have been added to confirm that the spec type is configured correctly - check, check-envtest, and check-envtest-existing are passing - check-kuttl tests are passing - connected to LoadBalancer service using GKE loadbalancer * Spelling (#3856) * spelling: adopt Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: case-sensitive Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: certificates Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: controller Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: current Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: directory Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: disconnected Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: github Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: identifier Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: independently Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: iterations Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: jqlang Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: mismatch Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: nonexistent Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: occurred Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: particularly Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: password Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: preexisting Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: remaining Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: requeuing Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: than Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: the Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: todo Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: utilized Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * spelling: version Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> --------- Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> * Use new images (#3858) * Allow customers to specify roles that they want credentials for in the CrunchyBridgeCluster spec. When specified, create corresponding Secret and fill with role name, password, and connection URI. If role is deleted from spec or secret name is changed, delete unused secret. * Separate bridge client structs and CBC API structs. Create separate request and response payload structs. Add fields to CBC status. * CBC Reconcile refactor. Added code to avoid overwriting secrets with the same name. Rename some API fields. Use pointers for booleans so false values still show up. Other minor changes. * Use resource package for k8s values and add code for conversion for values accepted/returned by bridge API. Co-authored-by: Chris Bandy <bandy.chris@gmail.com> * Remove nodePort check valid nodePorts can differ between clusters making it difficult to automate these checks. * Added upgrade conditions Issue:[PGO-916] * updated condition from updating to upgrading * Added logic to check if spec is invalid for an upgrade and return until spec is fixed. Also updated conditions * fix generate * fix e2e * fix tests * add `exposeReplicas` * gofmt the comment * bump k8s ver for tests * fix linter errors * fix upgrade-minor test * Revert "fix upgrade-minor test" This reverts commit e43a318. * revert `upgrade-minor` changes * increase timeout * fix upgrade --------- Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com> Co-authored-by: Drew Sessler <36803518+dsessler7@users.noreply.github.com> Co-authored-by: Chris Bandy <bandy.chris@gmail.com> Co-authored-by: Drew Sessler <drew.sessler@crunchydata.com> Co-authored-by: TJ Moore <tj.moore@crunchydata.com> Co-authored-by: Chris Bandy <chris.bandy@crunchydata.com> Co-authored-by: Joseph Mckulka <joseph.mckulka@crunchydata.com> Co-authored-by: Benjamin Blattberg <ben.blattberg@crunchydata.com> Co-authored-by: ValClarkson <valerie0149@gmail.com> Co-authored-by: szelenka <szelenka@gmail.com> Co-authored-by: Scott Zelenka <szelenka@cisco.com> Co-authored-by: Andrew L'Ecuyer <andrew.lecuyer@crunchydata.com> Co-authored-by: Tony Landreth <anthony.w.landreth@gmail.com> Co-authored-by: Sergey Pronin <spron-in@users.noreply.github.com> Co-authored-by: David Youatt <david.youatt@crunchydata.com> Co-authored-by: Val <ValClarkson@users.noreply.github.com> Co-authored-by: tjmoore4 <42497036+tjmoore4@users.noreply.github.com> Co-authored-by: Tony Landreth <56887169+tony-landreth@users.noreply.github.com> Co-authored-by: Roberto Mello <roberto.mello@gmail.com> Co-authored-by: Stefan Midjich <swehack@gmail.com> Co-authored-by: Stefan Midjich <stemid@users.noreply.github.com> Co-authored-by: David Jeffers <david@dajeffers.com> Co-authored-by: Anthony Landreth <tony.landreth@crunchydata.com> Co-authored-by: Greg Nokes <greg@nokes.name> Co-authored-by: ValClarkson <valerie.clarkson@crunchydata.com> Co-authored-by: Roman Gherta <roman.gherta@gmail.com> Co-authored-by: Benjamin Blattberg <benjamin.blattberg@gmail.com> Co-authored-by: Josh Soref <2119212+jsoref@users.noreply.github.com> Co-authored-by: Viacheslav Sarzhan <slava.sarzhan@percona.com> Co-authored-by: Natalia Marukovich <natalia.marukovich@percona.com>
Please ensure you do the following when reporting a bug:
Overview
When using pgBackRest on a schedule, the PGO will create CronJobs but doesn't set anything for the concurrencyPolicy; which can result in a large number of Pods being scheduled to run.
Environment
Please provide the following details:
GKE
ubi8-5.2.0-0
14
Steps to Reproduce
REPRO
Provide steps to get to the error condition:
EXPECTED
ACTUAL
Logs
Additional Information
Kubernetes will launch the Job as scheduled, but when the Pod for that job exists with a non-success code, Kubernetes will treat it as a failure and attempt to re-launch the Pod. But then that Pod will also fail, so Kubernetes will attempt to launch another Pod, etc. etc.
In this case, the Pods are failing because another Job's Pod is still performing the backup. Since we know this will always result in a failure, we should prevent multiple backups from the same CronJob from executing concurrently.
It may be worthwhile to expose some of the other CronJob settings as well, but setting
concurrencyPolicy
toForbid
should solve most the noise.Is this the only place we'd need to add this to?
https://github.com/CrunchyData/postgres-operator/blob/master/internal/controller/postgrescluster/pgbackrest.go#L2877-L2890
The text was updated successfully, but these errors were encountered: