Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep looking for master process #617

Merged
merged 15 commits into from
Mar 27, 2024
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion scripts/docker/nginx-oss/deb/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,4 @@ RUN apt install -y /agent/build/$PACKAGE_NAME.deb

FROM install-nginx as install-agent-repo

RUN apt-get update && apt-get install -y nginx-agent
RUN apt-get update && apt-get install -y nginx-agent
51 changes: 37 additions & 14 deletions src/plugins/registration.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ const (
dataplaneSoftwareDetailsMaxWaitTime = time.Duration(5 * time.Second)
// Time between attempts to gather DataplaneSoftwareDetails
softwareDetailsOperationInterval = time.Duration(1 * time.Second)
nginxStartMaxWaitTime = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this not essentially turn off backoff altogether, if the MaxElapsedTime is zero? If that is intentional, then I would maybe adding a comment explaining why it is turned off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the max wait time is set to 0 backoff will continue waiting indefinitely until the master process is found. I'll add in a comment now

)

type OneTimeRegistration struct {
Expand Down Expand Up @@ -93,13 +94,16 @@ func (r *OneTimeRegistration) Process(msg *core.Message) {
defer r.dataplaneSoftwareDetailsMutex.Unlock()
r.dataplaneSoftwareDetails[data.GetPluginName()] = data.GetDataplaneSoftwareDetails()
}
case msg.Exact(core.NginxDetailProcUpdate):
r.processes = msg.Data().([]*core.Process)
Copy link
Contributor

@olli-holmala olli-holmala Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a type check here, to ensure that the payload of the message is of the correct type (see the previous case).

If the payload were to contain something other than []*core.Process, then the code would panic, which is not great. Even though it should never have anything else than a []*core.Process based on what the code looks like today, you never know how the code base will change in the future. As a rule of thumb, it's always good practice to perform type checking to ensure safe code.

}
}

func (r *OneTimeRegistration) Subscriptions() []string {
return []string{
core.RegistrationCompletedTopic,
core.DataplaneSoftwareDetailsUpdated,
core.NginxDetailProcUpdate,
}
}

Expand Down Expand Up @@ -136,7 +140,6 @@ func (r *OneTimeRegistration) areDataplaneSoftwareDetailsReady() error {
log.Trace("No extension plugins to register")
return nil
}

r.dataplaneSoftwareDetailsMutex.Lock()
defer r.dataplaneSoftwareDetailsMutex.Unlock()

Expand All @@ -153,23 +156,43 @@ func (r *OneTimeRegistration) areDataplaneSoftwareDetailsReady() error {
func (r *OneTimeRegistration) registerAgent() {
var details []*proto.NginxDetails

for _, proc := range r.processes {
// only need master process for registration
if proc.IsMaster {
nginxDetails := r.binary.GetNginxDetailsFromProcess(proc)
details = append(details, nginxDetails)
// Reading nginx config during registration to populate nginx fields like access/error logs, etc.
_, err := r.binary.ReadConfig(nginxDetails.GetConfPath(), nginxDetails.NginxId, r.env.GetSystemUUID())
if err != nil {
log.Warnf("Unable to read config for NGINX instance %s, %v", nginxDetails.NginxId, err)
backoffSetting := backoff.BackoffSettings{
InitialInterval: softwareDetailsOperationInterval,
MaxInterval: softwareDetailsOperationInterval,
MaxElapsedTime: nginxStartMaxWaitTime,
Jitter: backoff.BACKOFF_JITTER,
Multiplier: backoff.BACKOFF_MULTIPLIER,
}

findMaster := func() error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
findMaster := func() error {
findNginxMasterProcesses := func() error {

for _, proc := range r.processes {
// only need master process for registration
if proc.IsMaster {
nginxDetails := r.binary.GetNginxDetailsFromProcess(proc)
details = append(details, nginxDetails)
if len(details) > 0 {
// Reading nginx config during registration to populate nginx fields like access/error logs, etc.
_, err := r.binary.ReadConfig(nginxDetails.GetConfPath(), nginxDetails.NginxId, r.env.GetSystemUUID())
if err != nil {
log.Warnf("Unable to read config for NGINX instance %s, %v", nginxDetails.NginxId, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this line isn't originally your code, but I'm wondering if we should actually be returning the error here instead of merely logging it 🤔

@oliveromahony, @dhurley, @Dean-Coakley, @aphralG thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the reason we dont return an error here is because we dont want to stop registration just because we cant read the config. If the config is invalid it should not stop the agent from performing other tasks like reporting the status of NGINX is that makes sense.

}
log.Info("Master process has been found")
return nil
}
} else {
log.Tracef("NGINX non-master process: %d", proc.Pid)
return nil
Copy link
Contributor

@olli-holmala olli-holmala Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return causes the whole function to exit, meaning that the function will only ever find the master process if it is the first process in r.processes. I would remove this return.

}
} else {
log.Tracef("NGINX non-master process: %d", proc.Pid)
}
return fmt.Errorf("No master process found, waiting...")
}
if len(details) == 0 {
log.Info("No master process found")
err := backoff.WaitUntil(
context.Background(), backoffSetting, findMaster,
)
if err != nil {
log.Warn(err.Error())
Copy link
Contributor

@olli-holmala olli-holmala Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previously:

    log.Info("No master process found")

But my instinct is that this should be an error log, as I'd think we can't complete registration if we haven't found a master process.

Others could also weigh in on this, as I wouldn't consider myself an expert on registration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I believe previously we would continue with registration even if we didn't find master processes but now that we are waiting until we find one I think it should be an error log message.

log.Errorf("Unable to find NGINX master processes, %v", err)

}

updated, err := types.TimestampProto(r.config.Updated)
if err != nil {
log.Warnf("failed to parse proto timestamp %s: %s, assuming now", r.config.Updated, err)
Expand Down
5 changes: 2 additions & 3 deletions src/plugins/registration_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import (
tutils "github.com/nginx/agent/v2/test/utils"
)

func TestRegistration_Process(t *testing.T) {
func TestRegistration_startRegistration(t *testing.T) {
tests := []struct {
name string
expectedMessageCount int
Expand Down Expand Up @@ -64,7 +64,6 @@ func TestRegistration_Process(t *testing.T) {
messages := messagePipe.GetMessages()

assert.Equal(tt, messages[0].Topic(), core.CommRegister)
// host info checked elsewhere
assert.NotNil(tt, messages[0].Data())

assert.Equal(tt, messages[1].Topic(), core.RegistrationCompletedTopic)
Expand All @@ -88,7 +87,7 @@ func TestRegistration_areDataplaneSoftwareDetailsReady(t *testing.T) {
func TestRegistration_Subscriptions(t *testing.T) {
pluginUnderTest := NewOneTimeRegistration(tutils.GetMockAgentConfig(), nil, tutils.GetMockEnv(), nil, tutils.GetProcesses())

assert.Equal(t, []string{core.RegistrationCompletedTopic, core.DataplaneSoftwareDetailsUpdated}, pluginUnderTest.Subscriptions())
assert.Equal(t, []string{core.RegistrationCompletedTopic, core.DataplaneSoftwareDetailsUpdated, core.NginxDetailProcUpdate}, pluginUnderTest.Subscriptions())
}

func TestRegistration_Info(t *testing.T) {
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading