Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when using SharedIndexInformer #6354

Closed
bzhangyan11 opened this issue Sep 18, 2024 · 6 comments · Fixed by #6356
Closed

Deadlock when using SharedIndexInformer #6354

bzhangyan11 opened this issue Sep 18, 2024 · 6 comments · Fixed by #6356
Assignees
Labels
Milestone

Comments

@bzhangyan11
Copy link

bzhangyan11 commented Sep 18, 2024

Describe the bug

My service couldn't receive any events from apiserver and I found some deadlocks in jstack.But i couldn't find the synchronizer 0x00000006c049ca40 which is held by "OkHttp Dispatcher"

https://github.com/openjdk/jdk8u/blob/jdk8u332-ga/jdk/src/share/classes/sun/security/ssl/SSLSocketImpl.java#L935

3968 Found one Java-level deadlock:
3969 =============================
3970 "OkHttp Dispatcher":
3971   waiting to lock monitor 0x00007f6a47f56718 (object 0x00000006c348e388, a io.fabric8.kubernetes.client.dsl.internal.WatchHTTPManager),
3972   which is held by "-1504937617-pool-11-thread-6"
3973 "-1504937617-pool-11-thread-6":
3974   waiting for ownable synchronizer 0x00000006c049ca40, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
3975   which is held by "OkHttp Dispatcher"
3976 
3977 Java stack information for the threads listed above:
3978 ===================================================
3979 "OkHttp Dispatcher":
3980     at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.scheduleReconnect(AbstractWatchManager.java:233)
3981     - waiting to lock <0x00000006c348e388> (a io.fabric8.kubernetes.client.dsl.internal.WatchHTTPManager)
3982     at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.watchEnded(AbstractWatchManager.java:433)
3983     at io.fabric8.kubernetes.client.dsl.internal.WatchHTTPManager.lambda$null$1(WatchHTTPManager.java:82)
3984     at io.fabric8.kubernetes.client.dsl.internal.WatchHTTPManager$$Lambda$745/1901952678.accept(Unknown Source)
3985     at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
3986     at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
3987     at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
3988     at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
3989     at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl$OkHttpAsyncBody.doConsume(OkHttpClientImpl.java:141)
3990     at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl$OkHttpAsyncBody$$Lambda$703/1759056657.run(Unknown Source)
3991     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
3992     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)                                                                                        
3993     at java.lang.Thread.run(Thread.java:879)
3994 "-1504937617-pool-11-thread-6":
3995     at sun.misc.Unsafe.park0(Native Method)
3996     - parking to wait for  <0x00000006c049ca40> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
3997     at sun.misc.Unsafe.park(Unsafe.java:1038)
3998     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:176)
3998     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:176)
3999     at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:842)
4000     at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:876)
4001     at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1207)
4002     at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
4003     at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
4004     at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:935)
4005     at okio.InputStreamSource.read(JvmOkio.kt:90)
4006     at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:129)
4007     at okio.RealBufferedSource.indexOf(RealBufferedSource.kt:427)
4008     at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.kt:320)
4009     at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.kt:105)
4010     at okhttp3.internal.http1.Http1ExchangeCodec$ChunkedSource.readChunkSize(Http1ExchangeCodec.kt:426)
4011     at okhttp3.internal.http1.Http1ExchangeCodec$ChunkedSource.read(Http1ExchangeCodec.kt:408)
4012     at okhttp3.internal.Util.skipAll(Util.kt:344)
4013     at okhttp3.internal.Util.discard(Util.kt:365)
4014     at okhttp3.internal.http1.Http1ExchangeCodec$ChunkedSource.close(Http1ExchangeCodec.kt:450)
4015     at okio.ForwardingSource.close(ForwardingSource.kt:34)
4016     at okhttp3.internal.connection.Exchange$ResponseBodySource.close(Exchange.kt:309)
4017     at okio.RealBufferedSource.close(RealBufferedSource.kt:477)
4018     at io.fabric8.kubernetes.client.utils.Utils.closeQuietly(Utils.java:206)
4019     at io.fabric8.kubernetes.client.utils.Utils.closeQuietly(Utils.java:220)
4020     at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl$OkHttpAsyncBody.cancel(OkHttpClientImpl.java:154)
4021     at io.fabric8.kubernetes.client.dsl.internal.WatchHTTPManager$$Lambda$2522/246463238.accept(Unknown Source)
4022     at java.util.Optional.ifPresent(Optional.java:159)
4023     at io.fabric8.kubernetes.client.dsl.internal.WatchHTTPManager.closeCurrentRequest(WatchHTTPManager.java:93)
4024     - locked <0x00000006c348e388> (a io.fabric8.kubernetes.client.dsl.internal.WatchHTTPManager)
4025     at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.closeRequest(AbstractWatchManager.java:161)
4026     - locked <0x00000006c348e388> (a io.fabric8.kubernetes.client.dsl.internal.WatchHTTPManager)
4027     at io.fabric8.kubernetes.client.informers.impl.cache.Reflector$$Lambda$2520/721648509.accept(Unknown Source)
4028     at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670)
4029     at java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:683)
4030     at java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2010)
4031     at io.fabric8.kubernetes.client.informers.impl.cache.Reflector.lambda$startWatcher$12(Reflector.java:239)
4032     at io.fabric8.kubernetes.client.informers.impl.cache.Reflector$$Lambda$739/2045679137.run(Unknown Source)
4033     at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
4034     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
4035     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)                                                                                        
4036     at java.lang.Thread.run(Thread.java:879)

Fabric8 Kubernetes Client version

6.13.3

Steps to reproduce

I don't know how to reproduce it.

Expected behavior

no deadlock

Runtime

Kubernetes (vanilla)

Kubernetes API Server version

other (please specify in additional context)

Environment

Linux

Fabric8 Kubernetes Client Logs

No response

Additional context

No response

@bzhangyan11 bzhangyan11 changed the title deadlock when using SharedIndexInformer Deadlock when using SharedIndexInformer Sep 18, 2024
@shawkins
Copy link
Contributor

http watches unfortunately haven't gotten as much scrutiny as websocket watches, and this does indeed look like a deadlock in the current state of that code. Are you choosing to use http watches or is that happening for you?

@shawkins
Copy link
Contributor

Also this appears to be okhttp specific. You could try one of the other http clients instead.

shawkins added a commit to shawkins/kubernetes-client that referenced this issue Sep 18, 2024
close: fabric8io#6354

Signed-off-by: Steve Hawkins <shawkins@redhat.com>
@manusa manusa added the bug label Sep 18, 2024
@bzhangyan11
Copy link
Author

bzhangyan11 commented Sep 19, 2024

http watches unfortunately haven't gotten as much scrutiny as websocket watches, and this does indeed look like a deadlock in the current state of that code. Are you choosing to use http watches or is that happening for you?

yes, i'm using http watches.

@bzhangyan11
Copy link
Author

Previously,my service got stock with websocket(can't receive any events from apiserver),so i upgraded fabric8 from 5.X to 6.13.3 and chose http watches instead.

@bzhangyan11
Copy link
Author

bzhangyan11 commented Sep 19, 2024

http watches unfortunately haven't gotten as much scrutiny as websocket watches, and this does indeed look like a deadlock in the current state of that code. Are you choosing to use http watches or is that happening for you?

and it seems that this issue only occurs in specific versions of the JDK.
https://github.com/openjdk/jdk8u/blob/jdk8u332-ga/jdk/src/share/classes/sun/security/ssl/SSLSocketImpl.java#L935

works well in this version of SSLSocketImpl.
https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/sun/security/ssl/SSLSocketImpl.java
https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/sun/security/ssl/AppInputStream.java

@shawkins
Copy link
Contributor

@bzhangyan11 thanks for the additional context and links. The proposed pr should be safe regardless - our http client layer over okhttp attempts to make it appear as an async api, but it's actually blocking. We hadn't seen this blocking on close behavior before - which you are pointing out looks to even be JRE specific. So if you can't change your JRE, and you need to keep using http watches, the only other near term workaround is to use a different http client.

@manusa manusa added this to the 7.0.0 milestone Sep 20, 2024 — with automated-tasks
@manusa manusa closed this as completed in b5c9a5c Sep 20, 2024
manusa pushed a commit that referenced this issue Sep 25, 2024
close: #6354

Signed-off-by: Steve Hawkins <shawkins@redhat.com>
Signed-off-by: Marc Nuri <marc@marcnuri.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants