Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor cron event to validate individual URLs #6520

Merged
merged 49 commits into from
Oct 21, 2021
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
6feb425
Update URL validation cron jobs
dhaval-parekh Aug 9, 2021
2ce519a
Add unit test cases
dhaval-parekh Aug 9, 2021
0e8fa73
Validate single URL on URL validation cron, Update test cases
dhaval-parekh Oct 1, 2021
a7aeb61
Hard core the site option key that need to delete
dhaval-parekh Oct 7, 2021
d7f1659
Merge branch 'develop' into dev-tool/5750-refactor-url-validation-cron
dhaval-parekh Oct 7, 2021
9dcb139
Update comments
dhaval-parekh Oct 7, 2021
dc72b8a
Remove unused references
dhaval-parekh Oct 7, 2021
a67cadd
Merge branch 'develop' of github.com:ampproject/amp-wp into dev-tool/…
westonruter Oct 12, 2021
4f1630e
Remove return of empty array
westonruter Oct 12, 2021
c085388
Replace extraneous URLValidationProvider with direct AMP_Validation_M…
westonruter Oct 12, 2021
2a05e8f
Update tests to use DependencyInjectedTestCase
westonruter Oct 12, 2021
f41681f
Simplify URL validation scheduling by putting the queue in URLValidat…
westonruter Oct 12, 2021
422d0f2
Re-schedule with new recurrence when schedule changes
westonruter Oct 13, 2021
bf25b0c
Unschedule event with wrong recurrence instead of rescheduling it
westonruter Oct 13, 2021
e7f51ea
Fix unit test case
dhaval-parekh Oct 13, 2021
5f56cc1
Revert "Fix unit test case"
dhaval-parekh Oct 13, 2021
3f803b0
Ensure that AMP-enabled posts are scanned
westonruter Oct 14, 2021
fc2c727
Remove obsolete force argument for amp validation CLI command
westonruter Oct 14, 2021
72ffb2d
Provide wp_get_scheduled_event() implementation for WP<5.1 and add tests
westonruter Oct 14, 2021
b5afb54
Try not using past time for scheduling event
westonruter Oct 14, 2021
7371fd4
Use daily schedule to test since available in older WP versions
westonruter Oct 14, 2021
dab728c
Merge branch 'develop' of github.com:ampproject/amp-wp into dev-tool/…
westonruter Oct 14, 2021
afcbcce
Fix covers phpdoc tags
westonruter Oct 14, 2021
ca7b70d
Use assertLessThanOrEqual rather than assertGreaterThanOrEqual
westonruter Oct 14, 2021
d1ccdaf
Fix covers phpdoc tag for wrap_block_callbacks
westonruter Oct 14, 2021
2e2f363
Account for wp_schedule_event() returning bool as of WP 5.1
westonruter Oct 14, 2021
4144830
Remove obsolete arg from URLScanningContext constructor
westonruter Oct 14, 2021
79d678f
Remove unused amp_url_validation_limit_per_type filter
westonruter Oct 14, 2021
da77a0d
Allow options to be overridden when calling get_supportable_templates
westonruter Oct 15, 2021
5bbadea
Refactor ScannableURLProvider and URLValidationProvider into services
westonruter Oct 15, 2021
ab2bff1
Include more AMP settings in validated environment
westonruter Oct 15, 2021
f0c7b3b
Fix tests after 5bbadea
westonruter Oct 15, 2021
55c0ea7
Clear out cron validation queue whenever validated environment changes
westonruter Oct 15, 2021
042a74c
Remove obsolete test for amp_should_use_new_onboarding
westonruter Oct 15, 2021
4e0dda5
Fix test assertions by using polyfilled TestCase
westonruter Oct 15, 2021
42a2c35
Remove unused DEFAULT_INTERVAL_WEEKLY and fix test_process_and_dequeu…
westonruter Oct 15, 2021
acf93dd
Use get_year_link() to obtain the year archive link and ensure it has…
westonruter Oct 15, 2021
2e94ba4
Ensure author links are not 404
westonruter Oct 15, 2021
6d25e3c
Improve test coverage
westonruter Oct 15, 2021
1b1b7b1
Fix test for get_author_page_urls
westonruter Oct 15, 2021
27b18de
Fix get_date_page test in WP<=5.1
westonruter Oct 15, 2021
5820ce8
Merge branch 'develop' of github.com:ampproject/amp-wp into dev-tool/…
westonruter Oct 18, 2021
5f5fe35
Merge branch 'develop' of github.com:ampproject/amp-wp into dev-tool/…
westonruter Oct 20, 2021
fce116d
Create a post before going to the onboarding wizard so there is somet…
westonruter Oct 20, 2021
8f2985a
Create test post and page using REST API
delawski Oct 21, 2021
43899d3
Do not show site preview if there are no preview URLs
delawski Oct 21, 2021
331955f
Clean up site preview E2E test
delawski Oct 21, 2021
8376583
Fix AMP Settings Review panel E2E test
delawski Oct 21, 2021
284e489
Harden check for post being set
westonruter Oct 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions includes/uninstall-functions.php
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ function delete_options() {
delete_option( 'amp-options' );
delete_option( 'amp_css_transient_monitor_time_series' );
delete_option( 'amp_customize_setting_modified_timestamps' );
delete_option( 'amp_url_validation_queue' ); // See Validation\URLValidationCron::OPTION_KEY.

$theme_mod_name = 'amp_customize_setting_modified_timestamps';
remove_theme_mod( $theme_mod_name );
Expand Down
1 change: 1 addition & 0 deletions src/BackgroundTask/CronBasedBackgroundTask.php
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ abstract class CronBasedBackgroundTask implements Service, Registerable {
const DEFAULT_INTERVAL_HOURLY = 'hourly';
const DEFAULT_INTERVAL_TWICE_DAILY = 'twicedaily';
const DEFAULT_INTERVAL_DAILY = 'daily';
const DEFAULT_INTERVAL_WEEKLY = 'weekly';

/**
* BackgroundTaskDeactivator instance.
Expand Down
14 changes: 10 additions & 4 deletions src/BackgroundTask/RecurringBackgroundTask.php
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,19 @@ final public function schedule_event( ...$args ) {
}

$event_name = $this->get_event_name();
$timestamp = wp_next_scheduled( $event_name, $args );
$recurrence = $this->get_interval();

if ( $timestamp ) {
return;
$scheduled_event = wp_get_scheduled_event( $event_name, $args );

// Unschedule any existing event which had a differing recurrence.
if ( $scheduled_event && $scheduled_event->schedule !== $recurrence ) {
wp_unschedule_event( $scheduled_event->timestamp, $event_name, $args );
$scheduled_event = null;
}

wp_schedule_event( time(), $this->get_interval(), $event_name, $args );
if ( ! $scheduled_event ) {
wp_schedule_event( time(), $recurrence, $event_name, $args );
}
}

/**
Expand Down
21 changes: 20 additions & 1 deletion src/Validation/SavePostValidationEvent.php
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,26 @@ final class SavePostValidationEvent extends SingleScheduledBackgroundTask implem
* @return bool Whether needed.
*/
public static function is_needed() {
return URLValidationCron::is_needed();

/**
* Filters whether to enable URL validation cron tasks.
*
* This is a feature flag used to control whether the sample set of site URLs are scanned on a daily basis and
* whether post permalinks are scheduled for immediate validation as soon as they are updated by a user who has
* DevTools turned off. This conditional flag will be removed once Site Scanning is implemented, likely in v2.2.
*
* @link https://github.com/ampproject/amp-wp/issues/5750
* @link https://github.com/ampproject/amp-wp/issues/4779
* @link https://github.com/ampproject/amp-wp/issues/4795
* @link https://github.com/ampproject/amp-wp/issues/4719
* @link https://github.com/ampproject/amp-wp/issues/5671
* @link https://github.com/ampproject/amp-wp/issues/5101
* @link https://github.com/ampproject/amp-wp/issues?q=label%3A%22Site+Scanning%22
*
* @param bool $enabled Enabled.
* @internal
*/
return apply_filters( 'amp_temp_validation_cron_tasks_enabled', false );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking in a subsequent PR to just remove the SavePostValidationEvent entirely. We'll have a weekly cron to re-check the most recently-published post aside from the user-initiated site scan in #6610. If a user has dev tools turned off, this will do a periodic health check without the overhead of doing validation for every single post when that data may not ever be used. What's more is that there is currently no garbage collection of validated URLs (although there needs to be), and without this the database would just keep silently getting larger and larger with validation data.

cc @delawski

}

/**
Expand Down
129 changes: 56 additions & 73 deletions src/Validation/URLValidationCron.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@
* WP cron process to validate URLs in the background.
*
* @package AMP
* @since 2.1
* @since 2.1
*/

namespace AmpProject\AmpWP\Validation;

use AmpProject\AmpWP\BackgroundTask\BackgroundTaskDeactivator;
use AmpProject\AmpWP\BackgroundTask\RecurringBackgroundTask;
use AmpProject\AmpWP\Infrastructure\Conditional;
use AMP_Validation_Manager;

/**
* URLValidationCron class.
Expand All @@ -19,75 +19,40 @@
*
* @internal
*/
final class URLValidationCron extends RecurringBackgroundTask implements Conditional {

/**
* ScannableURLProvider instance.
*
* @var ScannableURLProvider
*/
private $scannable_url_provider;

/**
* URLValidationProvider instance.
*
* @var URLValidationProvider
*/
private $url_validation_provider;
final class URLValidationCron extends RecurringBackgroundTask {

/**
* The cron action name.
*
* Note that only one queued URL is currently validated at a time.
*
* @var string
*/
const BACKGROUND_TASK_NAME = 'amp_validate_urls';

/**
* The length of time, in seconds, to sleep between each URL validation.
* Option key to store queue for URL validation.
*
* @var int
* @var string
*/
const DEFAULT_SLEEP_TIME = 1;
const OPTION_KEY = 'amp_url_validation_queue';

/**
* Check whether the service is currently needed.
* ScannableURLProvider instance.
*
* @return bool Whether needed.
* @var ScannableURLProvider
*/
public static function is_needed() {
/**
* Filters whether to enable URL validation cron tasks.
*
* This is a feature flag used to control whether the sample set of site URLs are scanned on a daily basis and
* whether post permalinks are scheduled for immediate validation as soon as they are updated by a user who has
* DevTools turned off. This conditional flag will be removed once Site Scanning is implemented, likely in v2.2.
*
* @link https://github.com/ampproject/amp-wp/issues/5750
* @link https://github.com/ampproject/amp-wp/issues/4779
* @link https://github.com/ampproject/amp-wp/issues/4795
* @link https://github.com/ampproject/amp-wp/issues/4719
* @link https://github.com/ampproject/amp-wp/issues/5671
* @link https://github.com/ampproject/amp-wp/issues/5101
* @link https://github.com/ampproject/amp-wp/issues?q=label%3A%22Site+Scanning%22
*
* @param bool $enabled Enabled.
* @internal
*/
return apply_filters( 'amp_temp_validation_cron_tasks_enabled', false );
}
private $scannable_url_provider;

/**
* Class constructor.
* Constructor.
*
* @param BackgroundTaskDeactivator $background_task_deactivator Service that deactivates background events.
* @param ScannableURLProvider $scannable_url_provider ScannableURLProvider instance.
* @param URLValidationProvider $url_validation_provider URLValidationProvider instance.
* @param ScannableURLProvider $scannable_url_provider ScannableURLProvider instance.
*/
public function __construct( BackgroundTaskDeactivator $background_task_deactivator, ScannableURLProvider $scannable_url_provider, URLValidationProvider $url_validation_provider ) {
public function __construct( BackgroundTaskDeactivator $background_task_deactivator, ScannableURLProvider $scannable_url_provider ) {
parent::__construct( $background_task_deactivator );

$this->scannable_url_provider = $scannable_url_provider;
$this->url_validation_provider = $url_validation_provider;
$this->scannable_url_provider = $scannable_url_provider;
}

/**
Expand All @@ -96,15 +61,49 @@ public function __construct( BackgroundTaskDeactivator $background_task_deactiva
* @param mixed[] ...$args Unused callback arguments.
*/
public function process( ...$args ) { // phpcs:ignore VariableAnalysis.CodeAnalysis.VariableAnalysis.UnusedVariable
$urls = $this->scannable_url_provider->get_urls();
$sleep_time = $this->get_sleep_time();
$url = $this->dequeue();
if ( $url ) {
AMP_Validation_Manager::validate_url_and_store( $url );
}
}

foreach ( $urls as $url ) {
$this->url_validation_provider->get_url_validation( $url['url'], $url['type'] );
if ( $sleep_time ) {
sleep( $sleep_time );
/**
* Dequeue to obtain URL to validate.
*
* @return string|null URL to validate or null if there is nothing queued up.
*/
protected function dequeue() {
$data = get_option( self::OPTION_KEY, [] );
if ( ! is_array( $data ) ) {
$data = [];
}
$data = array_merge(
[
'timestamp' => 0,
'urls' => [],
],
$data
);

// If there are no URLs queued, then obtain a new set.
if ( empty( $data['urls'] ) ) {
Comment on lines +99 to +100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A change in the validated environment (and enabled post types and templates) should also trigger this, since we need to obtain a new set of URLs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 55c0ea7 and ab2bff1


// If it has been less than a week since the last enqueueing, then do nothing.
if ( time() - $data['timestamp'] < WEEK_IN_SECONDS ) {
return null;
}

// @todo The URLs should be contextual based on the selected template mode, in particular only singular URLs should be included if using legacy Reader mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also relates to site scanning in general in #6610. Only the available templates should be scanned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5bbadea

$data['urls'] = wp_list_pluck( $this->scannable_url_provider->get_urls(), 'url' );
$data['timestamp'] = time();
}

// If there is not a queued URL, then enqueue a new set of URLs.
$url = array_shift( $data['urls'] );

update_option( self::OPTION_KEY, $data );

return $url ?: null;
}

/**
Expand All @@ -126,22 +125,6 @@ protected function get_event_name() {
* @return string An existing interval name.
*/
protected function get_interval() {
return self::DEFAULT_INTERVAL_DAILY;
}

/**
* Provides the length of time, in seconds, to sleep between validating URLs.
*
* @return int
*/
private function get_sleep_time() {

/**
* Filters the length of time to sleep between validating URLs.
*
* @since 2.1
* @param int The number of seconds. Default 1. Setting to 0 or a negative numbers disables all throttling.
*/
return max( (int) apply_filters( 'amp_url_validation_sleep_time', self::DEFAULT_SLEEP_TIME ), 0 );
return self::DEFAULT_INTERVAL_HOURLY;
}
}
77 changes: 52 additions & 25 deletions tests/php/src/Validation/URLValidationCronTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,17 @@

namespace AmpProject\AmpWP\Tests\Validation;

use AmpProject\AmpWP\BackgroundTask\BackgroundTaskDeactivator;
use AmpProject\AmpWP\BackgroundTask\CronBasedBackgroundTask;
use AmpProject\AmpWP\Infrastructure\Conditional;
use AmpProject\AmpWP\Infrastructure\Registerable;
use AmpProject\AmpWP\Infrastructure\Service;
use AmpProject\AmpWP\Tests\DependencyInjectedTestCase;
use AmpProject\AmpWP\Tests\Helpers\PrivateAccess;
use AmpProject\AmpWP\Tests\Helpers\ValidationRequestMocking;
use AmpProject\AmpWP\Tests\TestCase;
use AmpProject\AmpWP\Validation\ScannableURLProvider;
use AmpProject\AmpWP\Validation\URLScanningContext;
use AmpProject\AmpWP\Validation\URLValidationCron;
use AmpProject\AmpWP\Validation\URLValidationProvider;

/** @coversDefaultClass \AmpProject\AmpWP\Validation\URLValidationCron */
final class URLValidationCronTest extends TestCase {
final class URLValidationCronTest extends DependencyInjectedTestCase {
use ValidationRequestMocking, PrivateAccess;

/**
Expand All @@ -26,15 +22,24 @@ final class URLValidationCronTest extends TestCase {
*/
private $test_instance;

/** @var int */
private $request_count = 0;

/**
* Setup.
*
* @inheritdoc
*/
public function setUp() {
parent::setUp();
$this->test_instance = new URLValidationCron( new BackgroundTaskDeactivator(), new ScannableURLProvider( new URLScanningContext( 20 ) ), new URLValidationProvider() );
add_filter( 'pre_http_request', [ $this, 'get_validate_response' ] );
$this->test_instance = $this->injector->make( URLValidationCron::class );
add_filter(
'pre_http_request',
function () {
$this->request_count++;
return $this->get_validate_response();
}
);
}

/**
Expand All @@ -46,24 +51,13 @@ public function test_register() {
$this->assertInstanceof( URLValidationCron::class, $this->test_instance );
$this->assertInstanceof( Service::class, $this->test_instance );
$this->assertInstanceof( Registerable::class, $this->test_instance );
$this->assertInstanceof( Conditional::class, $this->test_instance );

$this->test_instance->register();

$this->assertEquals( 10, has_action( 'admin_init', [ $this->test_instance, 'schedule_event' ] ) );
$this->assertEquals( 10, has_action( URLValidationCron::BACKGROUND_TASK_NAME, [ $this->test_instance, 'process' ] ) );
}

/** @covers ::is_needed() */
public function test_is_needed() {
$this->assertFalse( URLValidationCron::is_needed() );

add_filter( 'amp_temp_validation_cron_tasks_enabled', '__return_true' );
$this->assertTrue( URLValidationCron::is_needed() );

remove_filter( 'amp_temp_validation_cron_tasks_enabled', '__return_true' );
}

/** @covers ::schedule_event() */
public function test_schedule_event_with_no_user() {
$event_name = $this->call_private_method( $this->test_instance, 'get_event_name' );
Expand Down Expand Up @@ -102,15 +96,48 @@ public function test_schedule_event_with_user_with_permission() {
* Test validate_urls.
*
* @covers ::process()
* @covers ::get_sleep_time()
* @covers ::dequeue()
*/
public function test_validate_urls() {
$this->factory()->post->create_many( 5 );

add_filter( 'amp_url_validation_sleep_time', '__return_false' );
/** @var ScannableURLProvider $scannable_url_provider */
$scannable_url_provider = $this->get_private_property( $this->test_instance, 'scannable_url_provider' );

$initial_urls = wp_list_pluck( $scannable_url_provider->get_urls(), 'url' );
$initial_url_count = count( $initial_urls );
$this->assertGreaterThan( 0, $initial_url_count );

delete_option( URLValidationCron::OPTION_KEY );

// Verify that processing will enqueue URLs (if none are queued) and process one.
for ( $i = 1; $i <= $initial_url_count; $i++ ) {
$this->test_instance->process();
$this->assertEquals( $i, $this->request_count );
$data = get_option( URLValidationCron::OPTION_KEY );
$this->assertArrayHasKey( 'urls', $data );
$this->assertArrayHasKey( 'timestamp', $data );
$this->assertLessThanOrEqual( time(), $data['timestamp'] );
$this->assertEquals(
array_slice( $initial_urls, $i ),
$data['urls']
);
}

// Ensure no URLs are queued or processed if timestamp is less than a week.
$data = get_option( URLValidationCron::OPTION_KEY );
$this->assertCount( 0, $data['urls'] );
$before_request_count = $this->request_count;
$this->test_instance->process();
$this->assertEquals( $before_request_count, $this->request_count );
$data = get_option( URLValidationCron::OPTION_KEY );
$this->assertCount( 0, $data['urls'] );

// Now test that after a week has transpired, the next set of URLs are re-queued and one is processed.
$data['timestamp'] = time() - WEEK_IN_SECONDS - MINUTE_IN_SECONDS;
update_option( URLValidationCron::OPTION_KEY, $data );
$this->test_instance->process();
$this->assertCount( 10, $this->get_validated_urls() );
$this->assertEquals( $before_request_count + 1, $this->request_count );
$data = get_option( URLValidationCron::OPTION_KEY );
$this->assertCount( $initial_url_count - 1, $data['urls'] );
}

/** @covers ::get_event_name() */
Expand All @@ -124,7 +151,7 @@ public function test_get_event_name() {
/** @covers ::get_interval() */
public function test_get_interval() {
$this->assertEquals(
URLValidationCron::DEFAULT_INTERVAL_DAILY,
URLValidationCron::DEFAULT_INTERVAL_HOURLY,
Copy link
Member

@westonruter westonruter Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed this change is problematic because there is an existing daily schedule, and because of that the change to hourly is not resulting in the re-scheduling. It seems we need to do a call to wp_get_scheduled_event() instead of using wp_next_scheduled() in RecurringBackgroundTask::schedule_event(), and then re-schedule it if the schedule does not match.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took my first stab at addressing this in 422d0f2. I'm not sure it's best, however. Namely, if there was a daily schedule that is going to happen in 23 hours, but then the schedule gets changed to hourly, then the initial timestamp remains 23 hours in the future. I guess it would be better to unschedule it enturely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There, I think this is better: bf25b0c.

$this->call_private_method( $this->test_instance, 'get_interval' )
);
}
Expand Down