-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: adding max retry count while trying to subscribe to a nms pod #66
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -160,8 +160,9 @@ func (confClient *ConfigClient) subscribeToConfigPod(commChan chan *protos.Netwo | |
logger.GrpcLog.Infoln("subscribeToConfigPod ") | ||
myid := os.Getenv("HOSTNAME") | ||
var stream protos.ConfigService_NetworkSliceSubscribeClient | ||
maxRetryCount := 10 | ||
for { | ||
if stream == nil { | ||
if stream == nil && maxRetryCount > 0 { | ||
status := confClient.Conn.GetState() | ||
var err error | ||
if status == connectivity.Ready { | ||
|
@@ -171,18 +172,26 @@ func (confClient *ConfigClient) subscribeToConfigPod(commChan chan *protos.Netwo | |
logger.GrpcLog.Errorf("Failed to subscribe: %v", err) | ||
time.Sleep(time.Second * 5) | ||
// Retry on failure | ||
maxRetryCount-- | ||
continue | ||
} | ||
} else if status == connectivity.Idle { | ||
logger.GrpcLog.Errorf("Connectivity status idle, trying to connect again") | ||
time.Sleep(time.Second * 5) | ||
maxRetryCount-- | ||
continue | ||
} else { | ||
logger.GrpcLog.Errorf("Connectivity status not ready") | ||
time.Sleep(time.Second * 5) | ||
maxRetryCount-- | ||
continue | ||
} | ||
} | ||
|
||
if stream == nil { | ||
break | ||
} | ||
|
||
rsp, err := stream.Recv() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. First - Could you please confirm that in your case SMF crashes at line 191 after 10 retry.... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gatici, I am seeing the same thing as @thakurajayL indicated here.
|
||
if err != nil { | ||
logger.GrpcLog.Errorf("Failed to receive message: %v", err) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we fail.. I thought unlimited try is better right ?
What if webconsole does not come up in (5*10 = 50 sec) 50 seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello,
Here the real problem is if SMF has the wrong IP resolution of NMS (if NMS pod is restarted), the loop never exits and without manually restarting the SMF pod, SMF can not connect to Webui again.
Putting a max retry count causes to restart the SMF pod and it gets the correct GRPC server address.
No matter GPRC server is up or not. Untill GRPC server is started, this process repeats.
That helps us to recover 5G configuration synchronization.
Otherwise, we need to detect all modules which have issues to connect NMS and restart them manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @thakurajayL Here are the full logs of NRF: https://pastebin.ubuntu.com/p/4cPQVvVdWv/
NMS is restarted then NRF lost the connection to NMS. After 10 retry, NRF is restarted and GRPC connection is recovered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understand what you are trying to solve. Could you point me to the code where SMF would restart after all MaxRetry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In SMF https://github.com/omec-project/smf/blob/master/factory/factory.go#L52 calls PublishOnConfigChange function.
Then
PublishOnConfigChange
callssubscribeToConfigPod
function which tries to connect GRPC server in an infinite loop. But this loop uses SmfConfig.Configuration.WebuiUr
i which can be resolved different IP addresses if NMS pod restarts. However loop never exits and a manual intervention is required after every NMS pod restart.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a result, it will restart on
commChannel := gClient.PublishOnConfigChange(false)
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am completely lost. I understand high level reasoning of resolving address again but I am not able to connect to the code the way you are explaining. Give me some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need the change the way Client is initialized.
Clubbing below 2 lines in single call will help in getting ...But I do not think how your change works..or why it works..