-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: remove unnecessary complexity in the data pipeline #347
Conversation
8262359
to
42b5b03
Compare
cb5ed23
to
6916687
Compare
2d88101
to
cfcb29c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
select { | ||
case <-ctx.Done(): | ||
return | ||
default: | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be updated to use Ctx.Err()
if you think it's worth the change:
if ctx.Err() != nil {
return
}
if s.cfg.DisableAnalysis { | ||
return | ||
} | ||
|
||
s.status.Update(model.DatapipeStatusAnalyzing, false) | ||
log.Measure(log.LevelInfo, "Starting analysis")() | ||
log.LogAndMeasure(log.LevelInfo, "Graph Analysis")() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be deferred?
if fileUploadJobsUnderAnalysis, err := s.db.GetFileUploadJobsWithStatus(model.JobStatusAnalyzing); err != nil { | ||
return 0, err | ||
} else { | ||
numJobsWaitingForAnalysis += len(fileUploadJobsUnderAnalysis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it could just be an assignment, although the increment is equivalent.
Description
This changeset removes several unnecessary components of the data pipeline. The original intent for the in-memory retention of completed file upload jobs was to trigger analysis immediately upon completion of file-upload. However, addition of configuration for the loop ticker and a clearer path for triggering a data pipeline loop run removes the need for this complexity.
This also removes a gap where a timed out file upload job could have its status overwritten later on during analysis.
Motivation and Context
The data pipeline is already complex enough, as-is. Let's not make it harder on ourselves to maintain this part of the system.
How Has This Been Tested?
Updated unit tests.
Types of changes
Checklist: