-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update scheduling for Entity Table Check Job #94
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericaporter - A few comments for your perusal.
Also we need to change the way we calculate the checksum. The checksum needs to be done on the postgres server rather than it being done on the rails server. I've updated this trello ticket here with the details. I expect we'll do that as a seperate PR ?
@@ -36,7 +36,7 @@ def send_initialisation_events | |||
DfE::Analytics::SendEvents.perform_now([initialise_analytics_event]) | |||
end | |||
|
|||
DfE::Analytics::EntityTableCheckJob.perform_later if DfE::Analytics.entity_table_checks_enabled? | |||
DfE::Analytics::EntityTableCheckJob.perform_async if defined?(Sidekiq) && DfE::Analytics.entity_table_checks_enabled? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this now that the job is scheduled by SideKiq ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 now updated
# Reschedules with sidekiq_cron to run every 24hours | ||
class EntityTableCheckJob | ||
include Sidekiq::Worker if defined?(Sidekiq::Worker) | ||
|
||
TIME_ZONE = 'London' | ||
|
||
def perform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check here if the entity table job is enabled and return early if it is not. ie
return unless DfE::Analytics.entity_table_checks_enabled?
We can across a situation in Register where the job was disabled, but it was still being scheduled, so this will prevent that situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@asatwal Thanks for reviewing! Will take a look at those requested updates. Separate PR for the SQL update was my intention, wondering if we could deploy them in succession and do a bit of benchmarking? |
README.md
Outdated
@@ -324,13 +324,23 @@ which means they can't be joined up. | |||
|
|||
### Entity Table Check Job | |||
|
|||
The entity table check job will run every night and sends data to verify that the latest version of an entity table in BigQuery matches the database. The job is defaulted to false but can be changed by updating the configuration option in | |||
`config/initializers/dfe_analytics.rb`: | |||
If you have *Sidekiq* and *Sidekiq-Cron* installed, you might want to enable the *Entity Table Check Job* to verify that the latest version of an entity table in Big Query matches the database. Ideally the job would be scheduled to run every night. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question here. Do we have to limit this to SideKiq ? I expect this would still work if they use Rescue or Delayed job for example. It's just we've passed on the responsibility of scheduling to the application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we don't need to limit it, it would still work if we revert to using perform_later
within the job rather than using perform_async
- depending on how Rescue or Delayed Job is set up they might have to change the perform_later
- but perform_later
I think would still be ok without being updated if Sidekiq is configured as the ActiveJob queue adapter
class EntityTableCheckJob < AnalyticsJob | ||
WAIT_TIME = Date.tomorrow.midnight | ||
# Reschedules with sidekiq_cron to run every 24hours | ||
class EntityTableCheckJob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AnalyticsJob
contains the DfE analytics queue as well as the retry config, could we not keep this ?
Also my knowledge on this isn't great, but do we necessarily need to include Sidekiq::Worker
? Is it required for cron scheduling ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial thinking was that we would need to include Sidekiq if services do not have Sidekiq installed as the active job adapter, but from what I can see, this is now the standard Sidekiq setup and a check of the services - they all have it installed like this so I can remove the include and also revert to inheriting from AnalyticsJob
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Nice work !
The previous iteration of this was low maintenance self-scheduling but when tested on Register, it was not scheduling properly.
This iteration is for services who have Sidekiq and Sidekiq cron already installed (or who want to install them).
Syntax is updated to work with Sidekiq and README.rb is updated to give instructions on how to set up the job.
Services will have more control over scheduling and the job will be more reliable.