-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add URL validation REST endpoint and WP cron task #5228
Conversation
…tibility-scanning
…tibility-scanning
Plugin builds for 2e23118 are ready 🛎️!
|
@johnwatkins0 We probably don't want to validate every URL. That would most likely just result in a lot of computational and storage expense without much added benefit. What we really need is rather is to validate URLs for each unique template and post type. Basically, the template hierarchy as shown in the supported templates settings section, combined with the post types. I propose that we get the first published post of each post type for scanning. Then each time validation needs to be performed, we'll have a consistent set of URLs to examine. If a given URL already has an |
A possible correction to what I said. We should be spawning a cron to validate all posts, but only when a published post is updated. In other words, if a user has DevTools turned off, instead of validation happening asynchronously when the post is updated, the update should schedule an immediate cron job to do the validation request. When cron is validating published post that has been updated, it should check to see if the URL has already been validated. If so, go ahead and store the results to make sure it is fresh. Otherwise, if it hasn't been validated before, check to see if the validation errors are already being reported for other validated URLs: if so, skip storing the results. This will prevent the validated URL posts from continuously filling up the DB with essentially the same results over and over again. The main thing that this cron job is for is finding new validation errors which haven't been encountered before, and it is these that we will alert the technical administrator of (including in the SIte Scan summary and Site Health test). |
@westonruter I might be confused about the distinction you're making between what happens when a user has dev tools on and what happens when they have dev tools off. If I'm reading correctly, it sounds like with dev tools on we will preserve the current behavior, with validation occurring on But when the user has dev tools off, we spawn a single scheduled event on post update to make sure no errors have occurred. Overall, in the no-dev-tools scenario, the main thing we care about is errors surfaced via the error taxonomy, and we want to do this with as few URL validation posts as possible. Question: What's the reason for using cron instead of the same I might not be following this correctly. In either case, I think we will still want to schedule a single immediate cron job on plugin activation to try to prime some URLs right away, regardless of the dev tools setting. |
That is correct. When DevTools are turned on, then validation happens with each user save.
The main thing we care about is if there are new validation errors being reported which don't currently have
Because using
Yes. Whenever the state of the site changes, such as a theme being updated or a plugin being installed, then this should result in re-validation of a set of representative URLs. This scan should happens on a regular basis (hourly) at which it checks to see if |
Thanks, this is making sense. I will revisit that part of this PR and request review when ready. |
Closing this in favor of 2-3 smaller forthcoming PRs. |
Summary
Fixes #1756
This relates to #4719 and #4795 also but does not resolve them because they involve the creation of new theme/plugin compatibility scanning features.
This refactors the existing WP CLI URL validation script into a set of classes to be used by (1) the CLI script, (2) a REST endpoint for URL validation, and (3) a WP cron task to validate URLs in the background. This also introduces classes for the REST endpoint and the cron task.
Changes
AMP_CLI_Validation_Command
into a newValidationURLProvider
class. This class provides URLs to validate, with an optional offset.AMP_CLI_Validation_Command
into a newURLValidationProvider
class.URLValidationProvider
has a locking mechanism to help prevent multiple processes from validating URLs at the same time.AMP_CLI_Validation_Command
to adjust to the new changes, including adding an error message if validation is locked (which will happen if a user tries to run the script while the cron process is running).ValidationURLsRESTController
wrapsValidationURLProvider
, andURLValidationRESTController
wrapsURLValidationProvider
.URLValidationCron
to handle the cron task. As of now, the cron task runs every hour, validating two URLs of each content type on each run, and incrementing by an offset of 2 each time. The interval and offset are adjustable via filters. We might want to change the defaults because at this rate it could take years for the cron to validate every URL on a very large site.How it will work
ValidationURLsRESTController
.URLValidationRESTController
. The REST controller will iterate through the URLs. With each URL it will first attempt to get stored URL validation data. If a URL doesn't have stored data, then it will be revalidated. After one URL is revalidated, the REST controller will send back a response with the validation data and the list of URLs still to be checked. The JS application will then make additional requests until there are no more URLs to check (probably with an escape after a certain number of requests), storing results in its own state along the way. With this approach, in the event that every URL has nonstale stored results, the JS app may only make a single request.locked
boolean, and we'll have to work through the exact approach when we work on the JS side. It could involve sleeping a few seconds and trying again, and then quitting after a few cycles with a friendly message.Checklist