Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: remove unnecessary complexity in the data pipeline #347

Merged
merged 2 commits into from
Jan 29, 2024

Conversation

zinic
Copy link
Collaborator

@zinic zinic commented Jan 24, 2024

Description

This changeset removes several unnecessary components of the data pipeline. The original intent for the in-memory retention of completed file upload jobs was to trigger analysis immediately upon completion of file-upload. However, addition of configuration for the loop ticker and a clearer path for triggering a data pipeline loop run removes the need for this complexity.

This also removes a gap where a timed out file upload job could have its status overwritten later on during analysis.

Motivation and Context

The data pipeline is already complex enough, as-is. Let's not make it harder on ourselves to maintain this part of the system.

How Has This Been Tested?

Updated unit tests.

Types of changes

  • Chore (a change that does not modify the application functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • Documentation updates are needed, and have been made accordingly.
  • I have added and/or updated tests to cover my changes.
  • All new and existing tests passed.
  • My changes include a database migration.

@zinic zinic force-pushed the datapipe-rework branch 3 times, most recently from cb5ed23 to 6916687 Compare January 26, 2024 17:10
Copy link
Contributor

@superlinkx superlinkx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zinic zinic merged commit a232198 into main Jan 29, 2024
3 checks passed
@zinic zinic deleted the datapipe-rework branch January 29, 2024 16:55
@github-actions github-actions bot locked and limited conversation to collaborators Jan 29, 2024
Comment on lines +100 to 104
select {
case <-ctx.Done():
return
default:
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be updated to use Ctx.Err() if you think it's worth the change:

if ctx.Err() != nil {
    return
}

if s.cfg.DisableAnalysis {
return
}

s.status.Update(model.DatapipeStatusAnalyzing, false)
log.Measure(log.LevelInfo, "Starting analysis")()
log.LogAndMeasure(log.LevelInfo, "Graph Analysis")()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be deferred?

if fileUploadJobsUnderAnalysis, err := s.db.GetFileUploadJobsWithStatus(model.JobStatusAnalyzing); err != nil {
return 0, err
} else {
numJobsWaitingForAnalysis += len(fileUploadJobsUnderAnalysis)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it could just be an assignment, although the increment is equivalent.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants