Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection resumption improvements #900

Merged
merged 5 commits into from
Jan 25, 2023

Conversation

ikbalkaya
Copy link
Contributor

@ikbalkaya ikbalkaya commented Jan 13, 2023

This PR introduces some fixes to connection resumption issues by implementing spec RTN15c1, RTN15c2 and RTN15c3. To summarize the rule of implementation.

  • If connection resume is successful, current serial is set to the first serial of pending queue, queue is cleared and previously sent pending messages are added in front of queued messages.
  • If connection resume is failed the message serial is reset to 0 and all pending messages are added the queue to be sent with the newly assigned serials

Previously when there was a new connection established, channels were suspended and pending messages were reset. This should not be happening anymore

Also I have made some changes to general interface of WebSocketTransport to be able to mock blocking receives

  • There is a new receive method that uses a single method interface to delegate handling of sending messages to connectionManager. There are also some changes with MockWebSocketTransport that allows blocking incoming acks/nacks. This class is also now exposed so that we can read sent published messages directly. Please note that changes done here for these classes only serve the particular case of publishing. I think it can be improved to contain other protocol actions too.

Some changes are inspired and taken from #842

Closes #474

@ikbalkaya ikbalkaya changed the title Reattach channel on reconnection Connection resumption improvemnts Jan 13, 2023
@ikbalkaya ikbalkaya changed the title Connection resumption improvemnts Connection resumption improvements Jan 13, 2023
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 13, 2023 21:19 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 16, 2023 11:46 Inactive
@ikbalkaya ikbalkaya requested a review from paddybyers January 16, 2023 14:28
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 16, 2023 14:56 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 16, 2023 16:49 Inactive
@ikbalkaya ikbalkaya self-assigned this Jan 16, 2023
@ikbalkaya ikbalkaya marked this pull request as ready for review January 16, 2023 18:05
Copy link
Contributor

@davyskiba davyskiba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those changes make sense, but keep in my mind my relatively small ably-java knowledge

@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 17, 2023 11:04 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 17, 2023 11:06 Inactive
@ikbalkaya ikbalkaya requested a review from KacperKluka January 17, 2023 11:09
Copy link
Member

@paddybyers paddybyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments inline.

@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 18, 2023 16:39 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 18, 2023 16:47 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 18, 2023 17:40 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 18, 2023 17:43 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 18, 2023 17:50 Inactive
Copy link
Member

@paddybyers paddybyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see inline comments.

I suggest please that we tidy this and remove the whitespace-related commits, because the commit history is now a mess.

And we still need tests pls.

@ikbalkaya ikbalkaya force-pushed the bug/lib_not_resending_pending_messages_after_resume branch from 0494dfd to 34a6826 Compare January 18, 2023 21:32
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 18, 2023 21:34 Inactive
@ikbalkaya ikbalkaya force-pushed the bug/lib_not_resending_pending_messages_after_resume branch from 34a6826 to d7a7081 Compare January 18, 2023 21:34
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 18, 2023 21:35 Inactive
@ikbalkaya
Copy link
Contributor Author

I suggest please that we tidy this and remove the whitespace-related commits, because the commit history is now a mess.

Sorry about this, this should be clearer now - I squashed commits that were causing that

And we still need tests pls.

I have not finished addressing everything yet sorry. I should request for another review when I'm done

@ikbalkaya ikbalkaya marked this pull request as draft January 19, 2023 11:40
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 19, 2023 12:19 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 20, 2023 18:41 Inactive
Copy link
Member

@paddybyers paddybyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see inline comments

@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 23, 2023 14:57 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 23, 2023 15:01 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 23, 2023 15:18 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 23, 2023 15:35 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 23, 2023 15:52 Inactive
@ikbalkaya ikbalkaya force-pushed the bug/lib_not_resending_pending_messages_after_resume branch from 64ce58f to a1ee4ba Compare January 23, 2023 15:55
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 23, 2023 15:56 Inactive
@ikbalkaya ikbalkaya marked this pull request as ready for review January 23, 2023 16:20
@ikbalkaya ikbalkaya requested review from paddybyers and jaley January 23, 2023 16:23
Copy link
Member

@paddybyers paddybyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good.

Can you please tidy the commit history before merging this please?

This is the first commit that addresses connection resumption issues by picking and adapting some changes from #842

This change only apply channel reattachment and overrides the previous behaviour where pending messages were suspended if a new connection was established with a non-fatal error.
This commit makes resume case explicit by isolating it into a single if branch.
There were also some comments left by Paddy which I thought will be useful to keep in codebase.
@ikbalkaya ikbalkaya force-pushed the bug/lib_not_resending_pending_messages_after_resume branch from a1ee4ba to e46d8a5 Compare January 24, 2023 10:39
@github-actions github-actions bot temporarily deployed to staging/pull/900/javadoc January 24, 2023 10:40 Inactive
When sending messages, the reattempted messages from pending queue would be readded because sendImpl will push a new message to the queue if ack is required.
This change clones previous messages from pending queue, using it as a reference to reattempt, but clears queue before reattempt.

So now how it works:
If the connection resumed successfully, pending messages (if available) will be added in front of queued messages, resetting the start serial to the first serial of pending messages.

If connection resume is not successful, msgSerial and start serial will be reset to 0.

This will also remove the explicit call to send() as the send should happen when the connection transition event arrives which should happen after this.

Please note that I kept msgSerial = 0 as this will be increased in sendImpl
Message serials are now reset based on whether the connection resume succeeded or failed. In case of success, the message serial will be reset to the first serial of pending messages and in case of failure that serial will be reset to 0, the same applies to start serial in pending message queue
MockWebSocketTransport and WebSocket transport have send() but not receive() functionality, That's why we are unable to allow / block or fail incoming messages.

In this commit I add a receiver interface that routes received protocol messages to websocket transport. as a result connectionManager.onMessage call moves to WebSocketTransport and as a result we can use the mock to set rules for receives. There is also a simple receive behaviour on MockWebSocketTransport that should allow us to define the same rules for receives as sends.

Also MockWebSocketTransport is now public and has a new instance publishedMessages that allows us assert internals / orders of those published messages.
Two tests testing the pending message behaviour : The first one is when the resume is successful and the second one is when the resume has failed
Copy link
Member

@paddybyers paddybyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

Copy link

@jaley jaley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not looked at the code but see you already had two others do that. Instead, I checked out this branch, built it and integrated it with AAT to rerun the NetworkingConnectivityTest. I can confirm that this fixes the following fault simulations:

  • DisconnectWithFailedResume
  • EnterUnresponsive

For those two, it'd be good to re-enable those tests when you integrate this release to AAT, to confirm they're green on main too.

It also fixes the problems initially causing ReenterOnResumeFailed to fail, leaving us now with just a non-fatal NACK causing an exception to be thrown - a separate known issue, so we can't re-enable that test until the fix comes it.

Nice work!

@ikbalkaya ikbalkaya merged commit 42b3608 into main Jan 25, 2023
@ikbalkaya ikbalkaya deleted the bug/lib_not_resending_pending_messages_after_resume branch January 25, 2023 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Lib is not re-sending pending messages on new transport after a resume
4 participants