Skip to content
This repository has been archived by the owner on Jan 16, 2024. It is now read-only.

HTTP2Transport race may lead to deadlock #10

Closed
jshslt opened this issue May 22, 2017 · 5 comments
Closed

HTTP2Transport race may lead to deadlock #10

jshslt opened this issue May 22, 2017 · 5 comments
Assignees
Milestone

Comments

@jshslt
Copy link

jshslt commented May 22, 2017

Hi,
I think there's a race in HTTP2Transport that can lead to deadlock if the networkLoop thread tries to signal out a successful connection, while another thread has asked for the network thread to terminate (and is waiting to join).

full scenario -
A - if any client thread calls MessageRouter::disable. MessageRouter lock is taken. MessageRouter::disconnectAllTransportsLocked is called. the transports are iterated, HTTP2Transport::disconnect is called, which calls join() on the SDK network thread.

B - SDK network thread, when connection is just established, HTTP2Transport::networkLoop calls the onConnected callback MessageRouter::onConnected, which tries to take the MessageRouter lock.

if B happens while A is in process, deadlock.

please advise?

@sanjayrd
Copy link
Contributor

Thanks for pointing this out @jshslt. We haven't had a chance to dive deep into this yet, but we will definitely investigate the matter to see what we can do.

In the meantime, could you expand on the use case of this? More specifically, is this an issue that you actually foresee happening in your production code or just something you noticed as you were going through the code?

Thanks!
Sanjay

@jshslt
Copy link
Author

jshslt commented May 23, 2017

@sanjayrd we happened to hit it for real. We've been getting a lot of SERVER_SIDE_DISCONNECTs, and sometimes the SDK appears to get in a PENDING->CONNECTED->PENDING->CONNECTED loop because of these - when we catch this starting to happen, we sometimes forcibly and fully disconnect the transport with MessageRouter::disconnect(), to rest a bit before trying to reconnect again.

@garmin-coleman
Copy link

@jshslt - FYI we were also seeing the PENDING->CONNECTED loop you mention. For us it would reliably happen if we accidentally had two clients using the same refresh token. The two clients would essentially ping-pong, causing the server to kick the other off when reconnecting.

@JamieMeyers JamieMeyers added the bug label Jun 2, 2017
@JamieMeyers
Copy link
Contributor

We have confirmed the problem and are working on a fix.

Thanks,
Jamie

@yugoren yugoren added the ACL label Jun 2, 2017
@scotthea-amazon scotthea-amazon self-assigned this Jun 8, 2017
kjkh pushed a commit that referenced this issue Jun 9, 2017
Changes in this update
-Implemented Sensory wake word detector functionality
-Removed the need for a std::recursive_mutex in MessageRouter
-Added AIP unit test
-Added handleDirectiveImmediately functionality to SpeechSynthesizer
-Added memory profiles for:
AIP
SpeechSynthesizer
ContextManager
AVSUtils
AVSCommon
-Bug fix for MultipartParser.h compiler warning
-Suppression of sensitive log data even in debug builds. Use cmake parameter -DACSDK_EMIT_SENSITIVE_LOGS=ON to allow logging of sensitive information in DEBUG builds
-Fix crash in ACL when attempting to use more than 10 streams
-Updated MediaPlayer to use autoaudiosink instead of requiring pulseaudio
-Updated MediaPlayer build to suppport local builds of GStreamer
-Fixes for the following Github issues:
#5
#8
#9
#10
#17
#24
@scotthea-amazon
Copy link
Contributor

A fix for this has been pushed in version 0.4.1.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants