Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated checks in health check response #32800

Closed
MigotMateusz opened this issue Apr 20, 2023 · 13 comments · Fixed by #35240
Closed

Duplicated checks in health check response #32800

MigotMateusz opened this issue Apr 20, 2023 · 13 comments · Fixed by #35240
Labels
area/health area/smallrye kind/bug Something isn't working triage/upstream Used for issues which are caused by issues in upstream projects/dependency
Milestone

Comments

@MigotMateusz
Copy link

MigotMateusz commented Apr 20, 2023

Describe the bug

When quarkus app is deployed on environment with multiple instances checks in health check response are repeated.
There are 3 instances and one instance returns expected behavior, second instance returns doubled response and third instance returns tripled instance.

Expected behavior

{
	"status": "UP",
	"checks": [
		{
			"name": "Kafka consumers health check",
			"status": "UP"
		},
		{
			"name": "Database connections health check",
			"status": "UP",
			"data": {
				"<default>": "UP"
			}
		}
	]
}

Actual behavior

{
	"status": "UP",
	"checks": [
		{
			"name": "Kafka consumers health check",
			"status": "UP"
		},
		{
			"name": "Kafka consumers health check",
			"status": "UP"
		},
		{
			"name": "Kafka consumers health check",
			"status": "UP"
		},
		{
			"name": "Database connections health check",
			"status": "UP",
			"data": {
				"<default>": "UP"
			}
		},
		{
			"name": "Database connections health check",
			"status": "UP",
			"data": {
				"<default>": "UP"
			}
		},
		{
			"name": "Database connections health check",
			"status": "UP",
			"data": {
				"<default>": "UP"
			}
		}
	]
}

How to Reproduce?

No response

Output of uname -a or ver

No response

Output of java -version

OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.10+9)

GraalVM version (if different from Java)

No response

Quarkus version or git rev

2.14.3.Final

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

@MigotMateusz MigotMateusz added the kind/bug Something isn't working label Apr 20, 2023
@quarkus-bot
Copy link

quarkus-bot bot commented Apr 20, 2023

/cc @jmartisk (health), @xstefank (health)

@xstefank
Copy link
Member

xstefank commented Apr 20, 2023

This is something strange. Can you give us some reproducer please?

I'm not able to reproduce this.

@geoand geoand added the triage/needs-reproducer We are waiting for a reproducer. label Apr 21, 2023
@roy-mdsol
Copy link

Im facing a similar bug. this happens only when I deploy the code onto the server. It doesn't happen when running in Dev mode.

@geoand
Copy link
Contributor

geoand commented May 8, 2023

We'll need a sample application in order to investigate this

@geoand
Copy link
Contributor

geoand commented May 22, 2023

Closing as we did not get a reproducer. Feel free to comment with any additional information if it becomes available

@geoand geoand closed this as not planned Won't fix, can't repro, duplicate, stale May 22, 2023
@bushrod
Copy link

bushrod commented Jun 16, 2023

This is happening to me too (also only on server, not locally). I logged the times and saw it is being called twice in rapid succession. It is duplicated even if the check name is identical with the differing timestamps in data. Maybe the checks could be de-dupified, if not fixed by not calling multiple times, by adding the check names to a set and converting that to array. I tried returning null from the custom check instead of up if the last call was < some milliseconds, but health reported down because of the null response.

{
    "status": "UP",
    "checks": [
        {
            "name": "2023-06-15T18:56:08.724432",
            "status": "UP"
        },
        {
            "name": "2023-06-15T18:56:08.724675",
            "status": "UP"
        }
    ]
}

@Kazeheki
Copy link

Kazeheki commented Aug 4, 2023

The issue only occurs when you set up liveness/readiness probes in your cluster.
As long as I didn't add them, the duplication didn't happen.
I created a sample project: https://github.com/Kazeheki/quarkus-smallrye-health-duplication-example

@geoand geoand reopened this Aug 4, 2023
@geoand
Copy link
Contributor

geoand commented Aug 7, 2023

@Kazeheki thanks a lot for the reproducer!

Mind explaining what you did with that project and saw the duplicate checks?

@Kazeheki
Copy link

Kazeheki commented Aug 7, 2023

@geoand I deployed the application in a cluster and set liveness/readiness probes.
If you run the start-script in the reproducer, it will setup a local kubernetes cluster for you (using k3d). In the file k8s.conf.yaml you see how I defined the deployment and the liveness/readiness probes.
The application itself basically only consists of a liveness and readiness probe.

I did nothing more than deploying it in a cluster with those probes and opening the health request localhost:18080/q/health in a browser.
If you leave out the readiness/liveness (comment out l.27-44) probes in the kubernetes definitions and apply the 'k8s.conf.yaml' (kubectl apply -f k8s.conf.yaml) to the cluster again and reload the health page, then the duplication is gone.

@geoand
Copy link
Contributor

geoand commented Aug 7, 2023

Thanks!

Was there anyway you could reproduce this while running the Quarkus application locally?

@Kazeheki
Copy link

Kazeheki commented Aug 7, 2023

I didn't manage to see the duplication in any other case than the liveness/readiness probe usage.

I tried to setup a scheduler to call the health endpoints every second which didn't have the same outcome with duplication as cluster liveness/readiness probes.

I also tried to do calls from outside (e.g. watch -n1 curl localhost:8080/q/health/live) which also didn't cause the duplication.

I can only recreate the duplication in a kubernetes cluster with liveness/readiness probes.

@geoand
Copy link
Contributor

geoand commented Aug 7, 2023

Gotcha, thanks!

geoand added a commit to geoand/smallrye-health that referenced this issue Aug 7, 2023
The methods of this class can (and are) called
concurrently by Quarkus
so there needs to be some control.

This PR provides the absolute basic form
(both in terms of coverage and in terms
of idioms used)

Relates to: quarkusio/quarkus#32800
@geoand
Copy link
Contributor

geoand commented Aug 7, 2023

@xstefank it seems like SmallryeHealthReporter lacks the necessary concurrency controls. I opened smallrye/smallrye-health#486 to address this, but I would not concider the ultimate solution,
it's just a quick fix for this issue.
My recommendation would be to merge it, release SR Health to include it, update Quarkus and when you have the time,
you look into more thorough concurrency coverage.

@Kazeheki Thanks again for the reproducer, it was most helpful! I could reliably reproduce the issue using it and was able to test my fix against it.

@geoand geoand added triage/upstream Used for issues which are caused by issues in upstream projects/dependency and removed triage/needs-reproducer We are waiting for a reproducer. labels Aug 7, 2023
geoand added a commit to geoand/smallrye-health that referenced this issue Aug 7, 2023
The methods of this class can (and are) called
concurrently by Quarkus
so there needs to be some control.

This PR provides the absolute basic form
(both in terms of coverage and in terms
of idioms used)

Relates to: quarkusio/quarkus#32800
@quarkus-bot quarkus-bot bot added this to the 3.4 - main milestone Aug 11, 2023
@gsmet gsmet modified the milestones: 3.4 - main, 3.3.0 Aug 11, 2023
@gsmet gsmet removed this from the 3.3.0 milestone Aug 24, 2023
@gsmet gsmet added this to the 3.2.5.Final milestone Aug 24, 2023
benkard pushed a commit to benkard/mulkcms2 that referenced this issue Aug 29, 2023
This MR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [flow-bin](https://github.com/flowtype/flow-bin) ([changelog](https://github.com/facebook/flow/blob/master/Changelog.md)) | devDependencies | minor | [`^0.214.0` -> `^0.215.0`](https://renovatebot.com/diffs/npm/flow-bin/0.214.0/0.215.1) |
| [org.liquibase:liquibase-maven-plugin](http://www.liquibase.org/liquibase-maven-plugin) ([source](https://github.com/liquibase/liquibase)) | build | patch | `4.23.0` -> `4.23.1` |
| [io.quarkus:quarkus-maven-plugin](https://github.com/quarkusio/quarkus) | build | minor | `3.2.3.Final` -> `3.3.0` |
| [io.quarkus:quarkus-universe-bom](https://github.com/quarkusio/quarkus-platform) | import | minor | `3.2.3.Final` -> `3.3.0` |
| [org.apache.maven.plugins:maven-enforcer-plugin](https://maven.apache.org/enforcer/) | build | minor | `3.3.0` -> `3.4.0` |

---

### Release Notes

<details>
<summary>flowtype/flow-bin</summary>

### [`v0.215.1`](flow/flow-bin@a92ce80...cbb038f)

[Compare Source](flow/flow-bin@a92ce80...cbb038f)

### [`v0.215.0`](flow/flow-bin@ca11e28...a92ce80)

[Compare Source](flow/flow-bin@ca11e28...a92ce80)

</details>

<details>
<summary>liquibase/liquibase</summary>

### [`v4.23.1`](https://github.com/liquibase/liquibase/blob/HEAD/changelog.txt#Liquibase-4231-is-a-patch-release)

[Compare Source](liquibase/liquibase@v4.23.0...v4.23.1)

</details>

<details>
<summary>quarkusio/quarkus</summary>

### [`v3.3.0`](https://github.com/quarkusio/quarkus/releases/tag/3.3.0)

[Compare Source](quarkusio/quarkus@3.2.4.Final...3.3.0)

##### Complete changelog

-   [#&#8203;35350](quarkusio/quarkus#35350) - Fix package type system property clearing
-   [#&#8203;35348](quarkusio/quarkus#35348) - quarkus-maven-plugin runs native building even if the profile is commented out
-   [#&#8203;35343](quarkusio/quarkus#35343) - ArC: fix StackOverflowError in AutoAddScopeBuildItem
-   [#&#8203;35319](quarkusio/quarkus#35319) - Register arrays of Hibernate ORM's JDBC basic types for reflection
-   [#&#8203;35315](quarkusio/quarkus#35315) - Fix Datasource timing issues with Liquibase / Flyway and OpenTelemetry
-   [#&#8203;35314](quarkusio/quarkus#35314) - Regression in 3.3.0.CR1: Synthetic bean instance for io.opentelemetry.api.OpenTelemetry not initialized yet
-   [#&#8203;35312](quarkusio/quarkus#35312) - Updates Infinispan to 14.0.13.Final
-   [#&#8203;35308](quarkusio/quarkus#35308) - Lock jib execution to avoid OverlappingFileLockException in parallel builds
-   [#&#8203;35305](quarkusio/quarkus#35305) - Fix the titles of the tables in RESTEasy Reactive doc
-   [#&#8203;35302](quarkusio/quarkus#35302) - Docs: Mention wilcard support in resteasy reactive XML serialisation exclude classes configuration
-   [#&#8203;35301](quarkusio/quarkus#35301) - Fix potential NPE in quarkus-csrf-reactive when no MediaType is found
-   [#&#8203;35299](quarkusio/quarkus#35299) - Output build graph using `quarkus.builder.graph-output` property
-   [#&#8203;35285](quarkusio/quarkus#35285) - NullPointerException during http post request when quarkus-csrf-reactive extension is added to a project
-   [#&#8203;35283](quarkusio/quarkus#35283) - Upgrade proto-google-common-protos to 2.23.0
-   [#&#8203;35282](quarkusio/quarkus#35282) - Avoid keeping references to BytecodeRecorderImpl
-   [#&#8203;35276](quarkusio/quarkus#35276) - Reinstate DevModeTestUtil to avoid breaking other projects that depend on it
-   [#&#8203;35273](quarkusio/quarkus#35273) - Fix small typo in comment
-   [#&#8203;35263](quarkusio/quarkus#35263) - Stop the recovery service while ArC is still around
-   [#&#8203;35245](quarkusio/quarkus#35245) - Add missing info to init Jobs
-   [#&#8203;35244](quarkusio/quarkus#35244) - Init Jobs are missing ServiceAccount and Image Pull Secrets
-   [#&#8203;35240](quarkusio/quarkus#35240) - Update SmallRye Health to 4.0.4
-   [#&#8203;34071](quarkusio/quarkus#34071) - 3.1.1 Final - java.lang.IllegalArgumentException: Class java.util.UUID\[] is instantiated reflectively but was never registered
-   [#&#8203;32800](quarkusio/quarkus#32800) - Duplicated checks in health check response
-   [#&#8203;11903](quarkusio/quarkus#11903) - Gradle multimodule project + quarkus-container-image-jib: OverlappingFileLockException

### [`v3.2.4.Final`](https://github.com/quarkusio/quarkus/releases/tag/3.2.4.Final)

[Compare Source](quarkusio/quarkus@3.2.3.Final...3.2.4.Final)

##### Complete changelog

-   [#&#8203;35300](quarkusio/quarkus#35300) - Fix `jandex-gradle-plugin-version` in CDI Reference
-   [#&#8203;35296](quarkusio/quarkus#35296) - Upgrade H2 to 2.2.220
-   [#&#8203;35258](quarkusio/quarkus#35258) - CDI Reference 1.1 has incomplete information for gradle
-   [#&#8203;35255](quarkusio/quarkus#35255) - Quartz: QuarkusMSSQLDelegate should extends MSSQLDelegate
-   [#&#8203;35211](quarkusio/quarkus#35211) - Document Maven config options that may be relevant when running tests
-   [#&#8203;35203](quarkusio/quarkus#35203) - Pass Maven user settings when initializing artifact resolver
-   [#&#8203;35193](quarkusio/quarkus#35193) - OpenTelemetry service name should have higher priority than app name when no resource attributes are set
-   [#&#8203;35189](quarkusio/quarkus#35189) - Quarkus CLI fixes
-   [#&#8203;35188](quarkusio/quarkus#35188) - SmallRyeGraphQLOverWebSocketHandler: use order value > Integer.MIN_VALUE
-   [#&#8203;35181](quarkusio/quarkus#35181) - REST Data with Panache should not produce links when hal is disabled
-   [#&#8203;35174](quarkusio/quarkus#35174) - Ensure the narayana-jta extension fully shuts down the recovery manager
-   [#&#8203;35172](quarkusio/quarkus#35172) - Kafka Streams: restore the feature name at Quarkus startup
-   [#&#8203;35171](quarkusio/quarkus#35171) - kafka-streams: feature not listed on startup
-   [#&#8203;35165](quarkusio/quarkus#35165) - Propagate all user methods in REST Data with Panache
-   [#&#8203;35160](quarkusio/quarkus#35160) - Properly use internal links to point to other guides
-   [#&#8203;35140](quarkusio/quarkus#35140) - ArC: fix deadlock when calling guarded methods on the same thread
-   [#&#8203;35136](quarkusio/quarkus#35136) - Deadlock while calling write-locked method from read-locked method
-   [#&#8203;34908](quarkusio/quarkus#34908) - `@RouteFilter` stopped working with WebSocket requests Quarkus 3.2.0.Final
-   [#&#8203;34875](quarkusio/quarkus#34875) - Quarkus build does not work since 3.2.0 with teamcity/plexus launcher
-   [#&#8203;34713](quarkusio/quarkus#34713) - Option to track build configuration for changes between builds
-   [#&#8203;34576](quarkusio/quarkus#34576) - Live reload stopped working on 3.2 when using XA transactions
-   [#&#8203;34514](quarkusio/quarkus#34514) - Support `@WithUnnamedKey` in documentation
-   [#&#8203;34065](quarkusio/quarkus#34065) - Add support for project Java version update based on extensions
-   [#&#8203;33317](quarkusio/quarkus#33317) - OpenTelemetry SDK autoconfiguration ignores OTEL service name in favor of Quarkus app name
-   [#&#8203;15461](quarkusio/quarkus#15461) - Quarkus tests fails mTLS authentication against internal Maven repository

</details>

<details>
<summary>quarkusio/quarkus-platform</summary>

### [`v3.3.0`](quarkusio/quarkus-platform@3.2.4.Final...3.3.0)

[Compare Source](quarkusio/quarkus-platform@3.2.4.Final...3.3.0)

### [`v3.2.4.Final`](quarkusio/quarkus-platform@3.2.3.Final...3.2.4.Final)

[Compare Source](quarkusio/quarkus-platform@3.2.3.Final...3.2.4.Final)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever MR is behind base branch, or you tick the rebase/retry checkbox.

👻 **Immortal**: This MR will be recreated if closed unmerged. Get [config help](https://github.com/renovatebot/renovate/discussions) if that's undesired.

---

 - [x] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNC4yNC4wIiwidXBkYXRlZEluVmVyIjoiMzQuMjQuMCJ9-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/health area/smallrye kind/bug Something isn't working triage/upstream Used for issues which are caused by issues in upstream projects/dependency
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants