Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for primary endpoint ready #46

Merged
merged 9 commits into from
Sep 27, 2022
Merged

Conversation

marceloneppel
Copy link
Member

@marceloneppel marceloneppel commented Sep 22, 2022

Issue

  • When relating the PostgreSQL k8s charm with other charm before the PostgreSQL charm is fully initialised (in an Active state), we can face some errors related to database connection problems. This can be checked by testing the relation between this charm and the PgBouncer k8s charm.
  • Those problems happen due the PostgreSQL charm primary k8s endpoint/service is not yet correctly pointing to the primary pod (it takes some time to work).
  • Jira issue: DPE-701

Solution

  • Avoid setting the cluster_initialised flag (that is checked in multiple charm events) before the primary endpoint/service is fully ready by checking its accessibility.

Context

  • Files to FOCUS when reviewing this PR:

    • src/charm.py: has the call to the check, an update to the upgrade charm event handler to avoid problems withe the primary endpoint being deleted when the pod is killed and also an update to the update status event holder to avoid logging error messages when the PostgreSQL/Patroni pebble service is not ready yet.

    • src/patroni.py: has the implementation of the primary endpoint/service readiness check.

    • src/resources.yaml has an additional port that is used to contact the primary endpoint during the initial check on charm.py pebble ready event.

    • tests/integration/new_relations/test_new_relations.py has an update to merge two tests in one (in a way that makes the PostgreSQL charm to be related faster to the application charm; this avoids any kind of regression on this new implementation).

  • Extra details about the other files:

    • tests/unit/test_charm.py and tests/unit/test_patroni.py were updated just to match the new implementations.

Testing

  • Unit tests were updated.
  • Integration tests for the relation were changed to (the same will work for legacy relations, but the charms that are used on those tests work differently, so they don't trigger the specific issue that is being solved through this PR).
  • The changes from this PR were manually tested by deploying this charm, the PgBouncer k8s charm and immediately relating them (which triggers the problem before these changes).

Release Notes

  • Add check for started/ready primary endpoint/service.

@marceloneppel marceloneppel changed the title Add check for started service Add check for primary endpoint ready Sep 26, 2022
@marceloneppel marceloneppel marked this pull request as ready for review September 26, 2022 10:11
WRFitch
WRFitch previously approved these changes Sep 26, 2022
src/charm.py Show resolved Hide resolved
Copy link

@MiaAltieri MiaAltieri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome!! One minor suggestion

src/patroni.py Outdated
except RetryError:
return False

return r.json()["state"] == "running"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the state isn't running is it worth retrying? ie

        try:
            for attempt in Retrying(stop=stop_after_delay(10), wait=wait_fixed(3)):
                with attempt:
                    r = requests.get(
                        f"{'https' if self._tls_enabled else 'http'}://{self._primary_endpoint}:8008/health",
                        verify=self._verify,
                    )
                    if not r.json()["state"] == "running":
                         raise endpointNotReadyError
        except RetryError: 
            return False

        return True

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is a great improvement! I will implement it. Thanks Mia!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated on fbb4332.

await ops_test.model.add_relation(
f"{APPLICATION_APP_NAME}:{FIRST_DATABASE_RELATION_NAME}", DATABASE_APP_NAME
)
await ops_test.model.wait_for_idle(apps=APP_NAMES, status="active", raise_on_blocked=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleverly tested :)

@marceloneppel
Copy link
Member Author

Added an additional test for relation broken on tests/integration/new_relations/test_new_relations.py via 3b61943. The updated library code was ok on the VM charm but it was missing here, so I updated here and added the test. Now the relations are more robust.

Copy link

@MiaAltieri MiaAltieri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome, really good work here :)

@marceloneppel marceloneppel merged commit 7b5f706 into main Sep 27, 2022
@marceloneppel marceloneppel deleted the fixes-patroni-api-check branch September 27, 2022 09:12
BON4 pushed a commit to BON4/postgresql-k8s-operator that referenced this pull request May 20, 2024
* Add check for started service

* Add check for primary endpoint being ready

* Update charm unit tests

* Add unit test

* Fix unit test

* Make property private

* Improve primary endpoint check

* Revert change

* Add test for relation broken
github-actions bot added a commit to canonical/test-runners-2-github-x64-postgresql-k8s-operator that referenced this pull request May 23, 2024
github-actions bot added a commit to canonical/test-runners-2-is-x64-postgresql-k8s-operator that referenced this pull request May 23, 2024
github-actions bot added a commit to canonical/test-runners-2-azure-arm64-postgresql-k8s-operator that referenced this pull request May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants