Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of data channels #4536

Merged
merged 16 commits into from
Jan 3, 2025
Merged

Conversation

danxuliu
Copy link
Member

Follow up to #3670

This was part of an upcoming pull request but was extracted for easier reviewing. It adjusts some data channel related code that is either needed for the other pull request (queuing messages if the data channel was not open yet), was incorrect (not disposing of all data channels, sending messages on data channels other than status) or could be improved (missing logs, documentation and tests). Please refer to the individual commit messages for details.

As data channels involve several threads (the events are received in the so called WebRTC signaling thread (unrelated to the thread were signaling messages are received), but the PeerConnectionWrapper can be used from RxJava executors, the main thread, the WebRTC signaling thread itself...) synchronization was needed in certain points. There are probably better ways to do the synchronization, but for simplicity I just resorted to synchronized methods/blocks and catching exceptions. Similarly the unit tests for the possible synchronization problems were done in a brute force approach, just repeating the test many times, so all this has a lot of room for improvements... In any case, thread safety should be kept in mind for future changes, as there are probably other areas of the code that might need to be adjusted 🤷

Besides that, adding the logs unveiled a bug that seems to have been there since forever: when the HPB is used the nick is tried to be sent even after the call activity is destroyed (and the PeerConnectionWrapper is also leaked). There is something wrong with the use of interval and repeatUntil, but that will be rewritten anyway in the upcoming pull request, so for now it stays as is.

Finally, one thing that still should be implemented (but it will not be implemented either in the other pull request) is disabling data channels so they are not even negotiated when not needed (for example, screen connections). This is not a big deal, however, as the data channel will be blocked by the other end, and even if it is opened it should not cause any trouble unless the Android app sends conflicting messages :-)

@danxuliu danxuliu added bug Something isn't working 3. to review Waiting for reviews labels Dec 11, 2024
@danxuliu
Copy link
Member Author

/backport to stable-20.1

@danxuliu danxuliu force-pushed the improve-handling-of-data-channels branch from 4ab45b9 to 1ac2c41 Compare December 11, 2024 05:44
@nickvergessen nickvergessen removed the request for review from sowjanyakch December 16, 2024 11:21
@danxuliu danxuliu force-pushed the improve-handling-of-data-channels branch from c7280b5 to b338693 Compare December 17, 2024 23:50
The PeerConnectionWrapper does not need to be injected in the
application, nor the Context needs to be injected in the
PeerConnectionWrapper. This all seems to be leftovers from the past, and
removing them would ease adding unit tests for the
PeerConnectionWrapper.

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
Log methods are static, so they can not be mocked using Mockito.
Although it might be possible to use PowerMockito a dummy implementation
was added instead, as Log uses are widespread and it is not something
worth mocking anyway.

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
This implicitly fixes trying to send the initial state on the latest
remote data channel found (which is the one stored in the "dataChannel"
attribute of the "PeerConnectionWrapper") when any other existing data
channel changes its status to open. Nevertheless, as all this will be
reworked, no unit test was added for it.

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
The legacy name was a bit strange, so now it is renamed to just "send"
as the parameter type ("DataChannelMessage") gives enough context.

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
Data channel messages are expected to be sent using the "status" data
channel that is locally created. However, if another data channel was
opened by the remote peer the reference to the "status" data channel was
overwritten with the new data channel, and messages were sent instead on
the remote data channel.

In current Talk versions that was not a problem, and the change makes no
difference either, because since the support for Janus 1.x was added
data channel messages are listened on all data channels, independently
of their label or whether they were created by the local or remote peer.

However, in older Talk versions this fixes a regression introduced with
the support for Janus 1.x. In those versions only messages coming from
the "status" or "JanusDataChannel" data channels were taken into
account. When Janus is not used the WebUI opens the legacy
"simplewebrtc" data channel, so that data channel may be the one used to
send data channel messages (if it is open after the "status" data
channel), but the messages received on that data channel were ignored by
the WebUI. Nevertheless, at this point this is more an academic problem
than a real world problem, as it is unlikely that there are many
Nextcloud servers with Talk < 16 and without HPB being used.

Independently of all that, when the peer connection is removed only the
"status" data channel is disposed, but none of the remote data channels
are. This is just a variation of an already existing bug (the last open
data channel was the one disposed due to being the last saved reference,
but the rest were not) and it will be fixed in another commit.

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
Data channel messages can be sent only when the data channel is open.
Otherwise the message is simply lost. Clients of the
PeerConnectionWrapper do not need to be aware of that detail or keep
track of whether the data channel was open already or not, so now data
channel messages sent before the data channel is open are queued and
sent once the data channel is opened.

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
Getting the label is no longer possible once the data channel has been
disposed. This will help to make the observer thread-safe.

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
Adding and disposing remote data channels is done from different
threads; they are added from the WebRTC signaling thread when
"onDataChannel" is called, while they can be disposed potentially from
any thread when "removePeerConnection" is called. To prevent race
conditions between them now both operations are synchronized.

However, as "onDataChannel" belongs to an inner class it needs to use a
synchronized statement with the outer class lock. This could still cause
a race condition if the same data channel was added again; this should
not happen, but it is handled just in case.

Moreover, once a data channel is disposed it can be no longer used, and
trying to call any of its methods throws an "IllegalStateException". Due
to this, as sending can be also done potentially from any thread, it
needs to be synchronized too with removing the peer connection.

State changes on data channels as well as receiving messages are also
done in the WebRTC signaling thread. State changes needs synchronization
as well, although receiving messages should not, as it does not directly
use the data channel (and it is assumed that using the buffers of a
disposed data channel is safe). Nevertheless a little check (which in
this case requires synchronization) was added to ignore the received
messages if the peer connection was removed already.

Finally, the synchronization added to "send" and "onStateChange" had the
nice side effect of making the pending data channel messages thread-safe
too, as before it could happen that a message was enqueued when the
pending messages were being sent, which caused a
"ConcurrentModificationException".

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
When the data channel is not open yet data channel messages are queued
and then sent once opened. "onStateChange" is called from the WebRTC
signaling thread, while "send" can be called potentially from any
thread, so to send the data channel messages in the same order that they
were added new messages need to be enqueued until all the pending
messages have been sent. Otherwise, even if there is synchronization
already, it could happen that "onStateChange" was called but, before
getting the lock, "send" gets it and sends the new message before the
pending messages were sent.

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
@mahibi mahibi force-pushed the improve-handling-of-data-channels branch from b338693 to d63bb31 Compare January 3, 2025 10:33
Copy link
Contributor

github-actions bot commented Jan 3, 2025

APK file: https://www.kaminsky.me/nc-dev/android-artifacts/4536-talk.apk

qrcode

To test this change/fix you can simply download above APK file and install and test it in parallel to your existing Nextcloud Talk app.

Copy link
Contributor

github-actions bot commented Jan 3, 2025

Codacy

Lint

TypemasterPR
Warnings158158
Errors7171

SpotBugs

CategoryBaseNew
Bad practice66
Correctness224222
Dodgy code7171
Internationalization33
Malicious code vulnerability33
Performance44
Security11
Total312310

@mahibi mahibi merged commit 159465a into master Jan 3, 2025
16 of 17 checks passed
@mahibi mahibi deleted the improve-handling-of-data-channels branch January 3, 2025 11:29
@mahibi
Copy link
Collaborator

mahibi commented Jan 3, 2025

Thank you @danxuliu !!!

@mahibi mahibi mentioned this pull request Jan 3, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3. to review Waiting for reviews bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants