Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update scheduling for Entity Table Check Job #94

Merged
merged 9 commits into from
Nov 14, 2023

Conversation

ericaporter
Copy link
Contributor

@ericaporter ericaporter commented Oct 26, 2023

The previous iteration of this was low maintenance self-scheduling but when tested on Register, it was not scheduling properly.

This iteration is for services who have Sidekiq and Sidekiq cron already installed (or who want to install them).

Syntax is updated to work with Sidekiq and README.rb is updated to give instructions on how to set up the job.

Services will have more control over scheduling and the job will be more reliable.

@ericaporter ericaporter requested a review from asatwal October 26, 2023 15:22
Copy link
Collaborator

@asatwal asatwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericaporter - A few comments for your perusal.

Also we need to change the way we calculate the checksum. The checksum needs to be done on the postgres server rather than it being done on the rails server. I've updated this trello ticket here with the details. I expect we'll do that as a seperate PR ?

@@ -36,7 +36,7 @@ def send_initialisation_events
DfE::Analytics::SendEvents.perform_now([initialise_analytics_event])
end

DfE::Analytics::EntityTableCheckJob.perform_later if DfE::Analytics.entity_table_checks_enabled?
DfE::Analytics::EntityTableCheckJob.perform_async if defined?(Sidekiq) && DfE::Analytics.entity_table_checks_enabled?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this now that the job is scheduled by SideKiq ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 now updated

# Reschedules with sidekiq_cron to run every 24hours
class EntityTableCheckJob
include Sidekiq::Worker if defined?(Sidekiq::Worker)

TIME_ZONE = 'London'

def perform
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check here if the entity table job is enabled and return early if it is not. ie

return unless DfE::Analytics.entity_table_checks_enabled?

We can across a situation in Register where the job was disabled, but it was still being scheduled, so this will prevent that situation.

Copy link
Contributor Author

@ericaporter ericaporter Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ericaporter
Copy link
Contributor Author

@ericaporter - A few comments for your perusal.

Also we need to change the way we calculate the checksum. The checksum needs to be done on the postgres server rather than it being done on the rails server. I've updated this trello ticket here with the details. I expect we'll do that as a seperate PR ?

@asatwal Thanks for reviewing! Will take a look at those requested updates. Separate PR for the SQL update was my intention, wondering if we could deploy them in succession and do a bit of benchmarking?

README.md Outdated
@@ -324,13 +324,23 @@ which means they can't be joined up.

### Entity Table Check Job

The entity table check job will run every night and sends data to verify that the latest version of an entity table in BigQuery matches the database. The job is defaulted to false but can be changed by updating the configuration option in
`config/initializers/dfe_analytics.rb`:
If you have *Sidekiq* and *Sidekiq-Cron* installed, you might want to enable the *Entity Table Check Job* to verify that the latest version of an entity table in Big Query matches the database. Ideally the job would be scheduled to run every night.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question here. Do we have to limit this to SideKiq ? I expect this would still work if they use Rescue or Delayed job for example. It's just we've passed on the responsibility of scheduling to the application.

Copy link
Contributor Author

@ericaporter ericaporter Nov 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't need to limit it, it would still work if we revert to using perform_later within the job rather than using perform_async - depending on how Rescue or Delayed Job is set up they might have to change the perform_later - but perform_later I think would still be ok without being updated if Sidekiq is configured as the ActiveJob queue adapter

class EntityTableCheckJob < AnalyticsJob
WAIT_TIME = Date.tomorrow.midnight
# Reschedules with sidekiq_cron to run every 24hours
class EntityTableCheckJob
Copy link
Collaborator

@asatwal asatwal Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalyticsJob contains the DfE analytics queue as well as the retry config, could we not keep this ?

Also my knowledge on this isn't great, but do we necessarily need to include Sidekiq::Worker ? Is it required for cron scheduling ?

Copy link
Contributor Author

@ericaporter ericaporter Nov 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thinking was that we would need to include Sidekiq if services do not have Sidekiq installed as the active job adapter, but from what I can see, this is now the standard Sidekiq setup and a check of the services - they all have it installed like this so I can remove the include and also revert to inheriting from AnalyticsJob

@ericaporter ericaporter changed the title Update EntityTableCheckJob to use Sidekiq and Sidekiq cron Update scheduling for Entity Table Check Job Nov 5, 2023
Copy link
Collaborator

@asatwal asatwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Nice work !

@ericaporter ericaporter merged commit 6a7aa14 into main Nov 14, 2023
7 checks passed
@ericaporter ericaporter deleted the scheduling-for-checksum branch November 14, 2023 10:58
@ericaporter ericaporter mentioned this pull request Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants