Problematic healing scenarios #620

denis-tingaikin · 2020-12-07T10:30:30Z

Problem statement

Let's consider all healing scenarios exclude registry cases (because registry cases still TODO)

Let's try to calculate healing complexity. Where 1*t is a time to wait for a heal for the one next application.

Most problem scenario is 'NSE dies' because only NSMGR that selects NSE can heal the connection and currently remote NSMGR cannot simply transfer the healing control to local NSMGR, so all elements between local NSMGR and NSE will wait for healing timeout.
The next points for optimization are 'Local NSMGR dies' and 'Remote NSMGR dies'. Recently we've added serialization of the requests. So we can have a heal race situation wherein the worst case forwarder will block requests from the client by serialization during healing it's NSMGR.

The text was updated successfully, but these errors were encountered:

denis-tingaikin · 2020-12-07T10:38:57Z

denis-tingaikin added the bug Something isn't working label Dec 7, 2020

artem-belov self-assigned this Jan 12, 2021

artem-belov mentioned this issue Feb 25, 2021

Heal rework #738

Merged

edwarnicke closed this as completed in #738 Mar 23, 2021