-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec: implement first draft of protocol v2.0: no-connection-serial #1526
Conversation
140d248
to
532e4c4
Compare
532e4c4
to
97717ad
Compare
97717ad
to
6b4e019
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not had time to review the contents of this pull request, in depth, yet... however, I wanted to make a couple of immediate observations:
- I don't feel that mutating features spec points is a good idea, because we don't have any concept of versioning of particular spec points downstream. In this case an example is the change to
RTN16c
. There are plenty of comments and attributes in SDK source code bases which reference that spec point, yet they will remain static even once this change merges. Even once there's a feature manifest in those SDKs stating which version of the canonical features list they refer to, the features spec points are still always referenced just as that - so the work of cross-verifying whether they apply at a given version feels insurmountable. My preference is that we declare spec points deleted, per our guidance, and replace them with a new spec point under a brand new reference. - We're currently in the process of establishing a new home for the features spec and Realtime protocol information, that will be independently versioned, outside of
ably/docs
, and hopefully also outside ofably/features
also (ADR66 in review).
Both relate to ADR66.
Also, the addition of RTN16g1
feels like it's starting to lean into a territory that should be outside of scope for the features spec.. How we do language/platform idiomatic variance should really be something that we document elsewhere, or implicitly document through prototype in existing codebases. My preference going forwards is that the spec describes the behaviour that's expected, without necessarily needing to go into specifics of API design. Speaking of mutability and side effects is appropriate for the features spec, but defining whether something is a function, method, field or property is something that can carry very different meanings for different languages and platforms.
Sure, I do appreciate that in general - there's multiple spec points I removed rather than mutating. RTN16c I kept just because I felt like the substance of it wasn't changed, we've just renamed the thing it's about. So before it was But if you disagree I'm happy to make that change.
I'm a bit confused here -- afaics the spec is mostly about describing the API design, I don't really understand what it would look like if it didn't do that? I understand you're planning on splitting off the IDL, but the new IDL also explicitly distinguishes properties from functions just like the old one..? I do appreciate that the "SDKs which choose not to increment their public major version when implementing protocol v2.0 may continue to implement this as ..." instruction may well not be appropriate for the spec, and I'd appreciate your advice on what to do here. I figured that without RTN16g1, the spec is basically mandating an change to the user-visible API, forcing a major version bump even if one would not otherwise be necessary, so I was trying to walk a line that didn't force that -- would be happy to have guidance on the right line to walk here. |
Thanks, @SimonWoolf. 😁 In response to points raised in your previous comment...
I'll take another look as a review the PR in closer detail. However, my gut still says that even a redirection (as result of the rename) is change enough to warrant a new spec point reference. 🤔
There is a lot of opinionated API design built into the features spec currently, I agree. My preference going forwards is that we veer away from mandating API design choices that could conflict with language or platform idiomatic alternatives. I tried to capture that in this internal comment, replying to @Srushtika, on ADR66 (New home for the features specification):
In respect of:
Again, I need to look closer at this change, however more generally there might be a softer change where we can move the SDK as a |
@QuintinWillison I've replaced RTN16c: a1e7f06 I'll leave RTN16g1 until you take a closer look -- I'm happy to write pretty much what you suggest, but that seems like it'd unavoidably mean without getting into the specifics of what a client lib should do until & after it increments its major sdk version, which it seems like you're saying you want to avoid? @andydunstall could you review 8179de3 for correctness? |
@sacOO7 pointed out that the current spec doesn't actually explicitly say how connection recovery works. It has some info around the edges, but is missing the central bits, like that the recover param should be set to the connectionKey. This seems like a good opportunity to fix that, therefore: 2a932d4 Also moved the new channelserial state to channel.properties as I'd forgotten that existed: 5786bfe |
This pull request shall remain open in this repository ( |
**** @(RTN15c1)@ @CONNECTED@ @ProtocolMessage@ with the same @connectionId@ as the current client, and no @error@. In this case, the server is indicating that the resume succeeded, all channels are still attached, and all backlog messages are available. The client should not change the state of attached channels, and immediately process any messages queued by the connection | ||
**** @(RTN15c2)@ @CONNECTED@ @ProtocolMessage@ with the same @connectionId@ as the current client, and an @error@. In this case, the server is indicating that the resume succeeded but with a non-fatal error, all channels are still attached, and some backlog messages may be unavailable. The @ErrorInfo@ received should be set as the @reason@ in the @CONNECTED@ event, and the @Connection#errorReason@ should be set. The client should not change the state of attached channels, and immediately process any messages queued by the connection. Any channels that are not resumed in full may receive an @ATTACHED@ @ProtocolMessage@ with an @error@, see "RTL12":#RTL12 | ||
**** @(RTN15c3)@ @CONNECTED@ @ProtocolMessage@ with a new @connectionId@ (and usually an @ErrorInfo@ in the @error@ field). In this case, a new connection has been established, the resume was unsuccessful, the channels are no longer attached, and the error indicates the cause of the unsuccessful resume. The @error@ if present should be set as the @reason@ in the @CONNECTED@ event, and as the @Connection#errorReason@. The client library should initiate an attach for channels that are in the @SUSPENDED@ state. For all channels in the @ATTACHING@ or @ATTACHED@ state, the client library should initiate a new attach (i.e. a new @ATTACH@ @ProtocolMessage@ sent for each channel). Finally, the internal @msgSerial@ counter is reset so that the first message published to Ably will contain a @msgSerial@ value of @0@ | ||
**** @(RTN15c6)@ A @CONNECTED@ @ProtocolMessage@ with the same @connectionId@ as the current client (and no @error@ property). This indicates that the resume attempt was valid. The client library should move all channels that were in the @ATTACHING@, @ATTACHED@, or @SUSPENDED@ states to the @ATTACHING@ state, and initiate an @RTL4c@ attach sequence for each. The connection should also process any messages queued per @RTLc6@ (there is no need to wait for the attaches to finish before processing queued messages). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mention serverside backlog messages are available and will be received on every freshly attached channel
. Because we are adding an explicit integration test for the same. Actually, the resume is well described on the site -> https://ably.com/docs/realtime/connection/, but cannot find it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, getting a CONNECTED with no error does not guarantee that the resume on each channel will succeed. That's kindof the point of this new resume model: it's now handled independently by each channel, the resume for each channel will succeed iff the channelSerial you send maps to something still in memory, irrespective of what the connection is doing.
The connection resume now pretty much just means you can keep using the same connectionId, which is really only relevant for presence members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sacOO7 If this conversation needs to continue, then that will need to happen against the new pull request that @SimonWoolf will open against ably/specification#8. Please resolve this conversation here once you have re-raised your concern there, or if you are already happy that this matter is resolved. Thanks.
*** @(RTP17c2)@ When an @ATTACHED@ message is received with no "@HAS_PRESENCE@":#TR3a flag (so no @SYNC@ is expected as the server does not believe anyone is currently present) | ||
** @(RTP17d)@ Automatic re-entry consists of, for each member of the @RTP17@ internal @PresenceMap@ whose @memberKey@ is not also a member of the normal @PresenceMap@, publishing a @PresenceMessage@ with an @ENTER@ action using the @clientId@ and @data@ attributes from that member, and removing that member from the internal @PresenceMap@ | ||
** @(RTP17f)@ The RealtimePresence object should perform automatic re-entry whenever a channel moves into the @ATTACHED@ state. (It does not need to do so for @RTL12@ @ATTACHED@ events received on already-@ATTACHED@ channels). | ||
** @(RTP17g)@ Automatic re-entry consists of, for each member of the @RTP17@ internal @PresenceMap@, publishing a @PresenceMessage@ with an @ENTER@ action using the @clientId@, @data@, and @id@ attributes from that member. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to remove member from internal presence map once presence
event is published?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Till now, for every ENTER
presence action, we were passing only clientId
and data
. Are we supposed to pass id
for every other presence ENTER
now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to remove member from internal presence map once presence event is published?
No. That spec item was RTP17d, it was removed for 2.0.
Till now, for every ENTER presence action, we were passing only clientId and data. Are we supposed to pass id for every other presence ENTER now?
When you send an ENTER becuase the use called presence.enter(), you don't add an id because there isn't one. This spec item is about re-entry specifically; where you do have an id because the internal my-members map is populated by presence messages from the server, which will have ids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sacOO7 If this conversation needs to continue, then that will need to happen against the new pull request that @SimonWoolf will open against ably/specification#8. Please resolve this conversation here once you have re-raised your concern there, or if you are already happy that this matter is resolved. Thanks.
** @(RTN16f)@ When a library is instantiated with the @recover@ client option, it should initialize its internal @msgSerial@ counter to the @msgSerial@ component of the @recoveryKey@. (If the recover fails, the counter should be reset to 0 per "RTN15c7":#RTN15c7 ) | ||
** @(RTN16j)@ When a library is instantiated with the @recover@ client option, for every channel/channelSerial pair in the @recoveryKey@, it should instantiate a corresponding channel and set its "RTL15b":#RTL15b @channelSerial@. | ||
** @(RTN16k)@ When the library first connects to Ably after being instantiated with a @recover@ client option, it should add an additional @recover@ querystring param to the websocket request, set from the @connectionKey@ component of the @recoveryKey@. Once the library has successfully connected to Ably, it should never again supply a @recover@ querystring param. | ||
** @(RTN16g)@ @Connection#getRecoveryKey@ is function that returns a string which incorporates the @connectionKey@, the current @msgSerial@, and a collection of pairs of channel @name@ and current @channelSerial@ for every currently attached channel. This must be serialized in a way which is able to encode any unicode channel name. The SDK may assume that the recovery key will only be consumed by the same type of SDK, so this spec does not specify any particular serialization; however, the format should be forward-compatible through the same major version of the SDK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add, if decoding fails when recover
value is supplied externally, it should raise some error? Currenly, we silently catch the error and then return null as a decode
value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's implicit that if someone supplies a malformed ClientOption
, the library should raise an error (eg throw an exception on construction), there's nothing unique about recover
in that.
If you think it should be more explicit, that's reasonable, but that wouldn't be anything to do with no-connection-serials, so I don't think it would belong in this PR. I suspect @QuintinWillison has his own plans to impove how error handling in SDKs is specified in the features spec, I'll leave that to him.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned currently we don't throw error, rather silently catch it and return null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a warning log when decoding fails and then it returns null, and then it won't set recover
param for null value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sacOO7 If this conversation needs to continue, then that will need to happen against the new pull request that @SimonWoolf will open against ably/specification#8. Please resolve this conversation here once you have re-raised your concern there, or if you are already happy that this matter is resolved. Thanks.
@@ -485,29 +483,38 @@ h3(#realtime-connection). Connection | |||
** @(RTN15g)@ Connection state is only maintained server-side for a brief period, given by the @connectionStateTtl@ in the @connectionDetails@, see "CD2f":#CD2f. If a client has been disconnected for longer than the @connectionStateTtl@, it should not attempt to resume. Instead, it should clear the local connection state, and any connection attempts should be made as for a fresh connection | |||
*** @(RTN15g1)@ This check should be made before each connection attempt. It is generally not sufficient to merely clear the connection state when moving to @SUSPENDED@ state (though that may be done too), since the device may have been sleeping / suspended, in which case it may have been many hours since it was last actually connected, even though, having been in the @CONNECTED@ state when it was put to sleep, it has only moved out of that state very recently (after waking up and noticing it's no longer connected) | |||
*** @(RTN15g2)@ Another consequence of that is that the measure of whether the client been disconnected for too long (for the purpose of this check) cannot just be whether the client left the @CONNECTED@ state more than @connectionStateTtl@ ago. Instead, it should be whether the difference between the current time and the last activity time is greater than the sum of the @connectionStateTtl@ and the @maxIdleInterval@, where the last activity time is the time of the last known actual sign of activity from Ably per "RTN23a":#RTN23a | |||
*** @(RTN15g3)@ When a connection attempt succeeds after the connection state has been cleared in this way, channels that were previously @ATTACHED@, @ATTACHING@, or @SUSPENDED@ must be automatically reattached, just as if the connection was a resume attempt which failed per "RTN15c3":#RTN15c3 | |||
** @(RTN15b)@ In order for a connection to be resumed and connection state to be recovered, the client must have received a @CONNECTED@ ProtocolMessage which will include a private connection key. To resume that connection, the library reconnects to the "websocket":https://ably.com/topic/websockets endpoint with two additional querystring params: | |||
*** @(RTN15g3)@ When a connection attempt succeeds after the connection state has been cleared in this way, channels that were previously @ATTACHED@, @ATTACHING@, or @SUSPENDED@ must be automatically reattached, just as if the connection was a resume attempt which failed per "RTN15c7":#RTN15c7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need even this spec point, it says the same thing as RTN15c7
. Feels more like an incomplete duplication, and confusing at the same time. We are saying the connection attempt succeeds yet the resume fails. I mean it doesn't cover a case for a successful resume, because it does the exact same thing channels that were previously @ATTACHED@, @ATTACHING@, or @SUSPENDED@ must be automatically reattached
as per RTN15c6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean it doesn't cover a case for a successful resume
A successful resume when the library is not attempting to resume is not a possible case
We are saying the connection attempt succeeds yet the resume fails
the resume did not 'fail', in the rtn15g case there is no attempt at a resume.
I don't think I agree that it's duplicative. RTN15c7 is a resume attempt which failed. RTN15g has the library not attempt a resume, there is nothing to 'fail'. It's saying that in this case, channels should also be reattached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RTN15g3
is talking about resume connection attempts right? I mean it will keep sending resume
flag in the request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RTN15g3 is a subitem of RTN15g. Read RTN15g.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see a condition where this seems to be feasible ->
State.Connection.HasConnectionStateTtlPassed(Now)
For this condtion, resume
and recover
is out of scope. Client will independency retry without doing resume
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sacOO7 If this conversation needs to continue, then that will need to happen against the new pull request that @SimonWoolf will open against ably/specification#8. Please resolve this conversation here once you have re-raised your concern there, or if you are already happy that this matter is resolved. Thanks.
** @(RTN16k)@ When the library first connects to Ably after being instantiated with a @recover@ client option, it should add an additional @recover@ querystring param to the websocket request, set from the @connectionKey@ component of the @recoveryKey@. Once the library has successfully connected to Ably, it should never again supply a @recover@ querystring param. | ||
** @(RTN16g)@ @Connection#getRecoveryKey@ is function that returns a string which incorporates the @connectionKey@, the current @msgSerial@, and a collection of pairs of channel @name@ and current @channelSerial@ for every currently attached channel. This must be serialized in a way which is able to encode any unicode channel name. The SDK may assume that the recovery key will only be consumed by the same type of SDK, so this spec does not specify any particular serialization; however, the format should be forward-compatible through the same major version of the SDK. | ||
*** @(RTN16g1)@ SDKs which choose not to increment their public major version when implementing protocol v2.0 may continue to implement this as a @Connection#recoveryKey@ attribute until the next major version change, using a getter function that's backwards-compatible with attribute access where possible, else by recomputing the recovery key string on every message. | ||
** @(RTN16h)@ @Connection#getRecoveryKey@ should return @Null@ when a the SDK is in the @CLOSED@, @CLOSING@, @FAILED@, or @SUSPENDED@ states, or when it does not have a @connectionKey@ (for example, it has not yet become connected). The @Connection#key@ and @Connection#id@ should also be @Null@ at these times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check if connectionId
is null too? Currently I am only checking if connectionKey
is null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am currently only checking
if (Key.IsEmpty() || InnerState.State == Realtime.ConnectionState.Closing
|| InnerState.State == Realtime.ConnectionState.Closed
|| InnerState.State == Realtime.ConnectionState.Failed
|| InnerState.State == Realtime.ConnectionState.Suspended)
{
return null;
}
While in cocoa, we also check if id
should be empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks fine to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean I should also check for id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, what you have there already looks fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, thanks : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sacOO7 If this conversation needs to continue, then that will need to happen against the new pull request that @SimonWoolf will open against ably/specification#8. Please resolve this conversation here once you have re-raised your concern there, or if you are already happy that this matter is resolved. Thanks.
RTN15f was wrong wrong (implied you should only republish if a resume was successful) and redundent to RTN19a.
9632561
to
988218b
Compare
988218b : I have removed RTN15f entirely (it was mostly redundent to RTN19a, and the bit that wasn't redundent was wrong as it implied you should only republish if a resume was successful, which is no longer correct given transient publishing), and instead added a couple small clarifications to RTN19a. |
** @(RTP17d)@ Automatic re-entry consists of, for each member of the @RTP17@ internal @PresenceMap@ whose @memberKey@ is not also a member of the normal @PresenceMap@, publishing a @PresenceMessage@ with an @ENTER@ action using the @clientId@ and @data@ attributes from that member, and removing that member from the internal @PresenceMap@ | ||
** @(RTP17f)@ The RealtimePresence object should perform automatic re-entry whenever a channel moves into the @ATTACHED@ state. (It does not need to do so for @RTL12@ @ATTACHED@ events received on already-@ATTACHED@ channels). | ||
** @(RTP17g)@ Automatic re-entry consists of, for each member of the @RTP17@ internal @PresenceMap@, publishing a @PresenceMessage@ with an @ENTER@ action using the @clientId@, @data@, and @id@ attributes from that member. | ||
** @(RTP17c)@ This clause and its subclauses have been replaced by @RTN17f@ as of the 2.0 spec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is supposed to be RTP17f
instead of RTN17f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sacOO7 If this conversation needs to continue, then that will need to happen against the new pull request that @SimonWoolf will open against ably/specification#8. Please resolve this conversation here once you have re-raised your concern there, or if you are already happy that this matter is resolved. Thanks.
This change proposed in this pull request has been moved to the new home of the features spec, in the |
@@ -54,7 +54,7 @@ h2(#test-guidelines). Test guidelines | |||
* @(G1)@ Every test should be executed using all supported protocols (i.e. JSON and "MessagePack":https://msgpack.org/ if supported). This includes both sending & receiving data | |||
* @(G2)@ All tests by default are run against a special Ably sandbox environment. This environment allows apps to be provisioned without any authentication that can then be used for client library testing. Bear in mind that all apps created in the sandbox environment are automatically deleted after 60 minutes and have low limits to prevent abuse. Apps are configured by sending a @POST@ request to @https://sandbox-rest.ably.io/apps@ with a JSON body that specifies the keys and their associated capabilities, channel namespace rules and any presence fixture data that is required; see "ably-common test-app-setup.json":https://github.com/ably/ably-common/blob/main/test-resources/test-app-setup.json. Presence fixture data is necessary for the REST library presence tests as there is no way to register presence on a channel in the REST library | |||
* @(G3)@ Testing statistics can be tricky due to timing issues and slow test suites as a result of sending requests to generate statistics. As such, we provide a special stats endpoint in our sandbox environment that allows stats to be injected into our metrics system so that stats tests can make predictable assertions. To create stats you must send an authenticated @POST@ request to the stats JSON to @https://sandbox-rest.ably.io/stats@ with the stats data you wish to create. See the "JavaScript stats fixture":https://github.com/ably/ably-js/blob/4e65d4e13eb8750a375b9511e4dd059092c0e481/spec/rest/stats.test.js#L8-L51 and "setup helper":https://github.com/ably/ably-js/blob/4e65d4e13eb8750a375b9511e4dd059092c0e481/spec/common/modules/testapp_manager.js#L158-L182 as an example | |||
* @(G4)@ This spec defines API version 1.2. A client library must identify to Ably the version of the spec it uses in all requests and connections, per "RSC7a":#RSC7a and "RTN2f":#RTN2f. The spec it uses is defined as the latest API version for which the library implements all spec items relating to the wire protocol | |||
* @(G4)@ This spec defines API version 2.0. A client library must identify to Ably the version of the spec it uses in all requests and connections, per "RSC7a":#RSC7a and "RTN2f":#RTN2f. The spec it uses is defined as the latest API version for which the library implements all spec items relating to the wire protocol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we highlight 2.0
as a string? Actually, mistakenly in dotnet code it was using v=2
as a param and hence server was failing to recognize the correct protocol version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ably/specification#88 I've made it an inline code block to make it clearer that it's a literal string.
Ported to ably/specification#88 |
As described in ADR63
NB: this is the first draft of a new breaking spec version. It describes a significant change in the realtime protocol, and cannot be implemented piecemeal. I've made the base branch
integration/protocol-2.0
per Cf https://github.com/ably/docs/blob/main/README_INTERNAL.md#branch-and-tag-scheme-for-features-spec .It needed fewer changes than I was expecting.