Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problematic healing scenarios #620

Closed
denis-tingaikin opened this issue Dec 7, 2020 · 1 comment · Fixed by #738
Closed

Problematic healing scenarios #620

denis-tingaikin opened this issue Dec 7, 2020 · 1 comment · Fixed by #738
Assignees
Labels
bug Something isn't working

Comments

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Dec 7, 2020

Problem statement

Let's consider all healing scenarios exclude registry cases (because registry cases still TODO)

Healing complexity

Let's try to calculate healing complexity. Where 1*t is a time to wait for a heal for the one next application.

Local use-cases

  1. NSE dies, complexity is 3t
  2. NSMGR dies, complexity is 3t
  3. Forwarder dies, complexity is 1t
  4. NSC dies, complexity is 0

Remote use-cases

  1. NSE dies, complexity is 6t
  2. Remote NSMGR dies, complexity is 3t
  3. Remote Forwarder dies, complexity is t
  4. Local NSMGR dies, complexity is 3t
  5. Local Forwarder dies, complexity is t
  6. NSC dies, complexity is 0

Interdomain use-cases

  1. NSE dies, complexity is 7t
  2. Remote NSMGR dies, complexity is 3t
  3. Remote Forwarder dies, complexity is t
  4. Proxy NSMGR dies, complexity is t
  5. Local NSMGR dies, complexity is 3t
  6. Local Forwarder dies, complexity is t
  7. NSC dies, complexity is 0

Visualization

Heal use-cases (1)

Worst-case scenario analysis

  1. Most problem scenario is 'NSE dies' because only NSMGR that selects NSE can heal the connection and currently remote NSMGR cannot simply transfer the healing control to local NSMGR, so all elements between local NSMGR and NSE will wait for healing timeout.

  2. The next points for optimization are 'Local NSMGR dies' and 'Remote NSMGR dies'. Recently we've added serialization of the requests. So we can have a heal race situation wherein the worst case forwarder will block requests from the client by serialization during healing it's NSMGR.

@denis-tingaikin
Copy link
Member Author

@edwarnicke . @haiodo , @fkautz cc

@denis-tingaikin denis-tingaikin added the bug Something isn't working label Dec 7, 2020
@artem-belov artem-belov self-assigned this Jan 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants