Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify a Node's System API (re)discovery procedure #6

Open
garethsb opened this issue Feb 11, 2020 · 4 comments
Open

Clarify a Node's System API (re)discovery procedure #6

garethsb opened this issue Feb 11, 2020 · 4 comments

Comments

@garethsb
Copy link
Contributor

(Copied from discussion elsewhere)

I would like IS-09 to clarify the relationship between the Node's discovery procedure for a System API and the discovery procedure for a Registration API defined by IS-04.

IS-04 is clear about how connection failures, errors and timeouts should be handled, including retry, exponential back-off, etc. This process doesn't prevent the Node starting up its other functions, Node API, senders and receivers, etc.

I don't seem to be able to get the same level of detail from TR-1001-1 or IS-09 on interaction with the System API, and understand whether or not System API discovery may be performed simultaneously with Registration API discovery, and whether System API re-discovery should ever be performed, e.g. periodically, or perhaps when failures have been encountered with all Registration APIs, which might suggest the is04.heartbeat_interval has been changed in the system.

--

This may be partially addressed by PR #3.

@andrewbonney
Copy link
Contributor

Perhaps the first thing to clarify is whether the system resource is intended purely for device startup, or whether it is for maintaining correct configuration over longer periods. The former is notionally simpler, but does present at least a couple of issues:

  • If a Media Node comes online before the system resource is available (after a major outage), it will blindly assume its previous config (although generally this is probably the correct behaviour in this condition).
  • If the system resource config is changed for any reason, only new or rebooted devices will obtain this configuration, with the others requiring manual intervention.

The latter definition is certainly more flexible, but likely has a lot more which needs to be defined as a result. The PR mentioned (or at least the TTL aspect) is likely only relevant in this case.

Perhaps v1.0 could be limited to the former, with room to expand into the latter behaviour in a v1.1 at a later date.

@garethsb
Copy link
Contributor Author

Confirming which of those approaches is expected would be a great start. Thanks, Andrew.

Even in the former case - which is all that is checked by the JT-NM Tested criteria right now, I believe - I think there are still details to nail down, like how long the Node waits for/how many times it retries the System API at start-up, and whether it is permitted to connect to a Registry and enable RTP transmitting, etc. during this time period.

@wsneijers
Copy link

Good point. Personally I think the second approach makes more sense:

  • It is more robust in regard to high availability, resource updates and startup sequence.
  • It is more in line with existing discovery mechanisms (I'm referring to IS-04 registered and peer-to-peer operation).

But indeed it is more complex and it may be better to start simple and expand from there.

@garethsb
Copy link
Contributor Author

The difference between a Node's communication with the System API and with the Registration API is that the former is currently a single GET request, whereas the latter involves the regular heartbeat POST requests. Encountering an error in a Registration API request is the specified trigger to discover an alternative Registry. There is no such regular request mechanism defined between the Node and the System API, so it would need something else, such as TTL or a time interval as used in API security/authorization. (This fact that Node registration behaviour is 'sticky' unless it encounters errors has sometimes been confusing.)

We have a prototype that uses a time interval to poll the System API, which also currently enables RTP senders/receivers and uses a Registry heartbeat interval according to cached values, before a System API is discovered at start up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants