Optimize force_publish_missed_schedules and confirm_scheduled_posts queries #376

rbcorrales · 2024-12-14T16:49:12Z

This update optimizes the queries used by the force_publish_missed_schedules and confirm_scheduled_posts functions, triggered by the a8c_cron_control_force_publish_missed_schedules and a8c_cron_control_confirm_scheduled_posts internal events, respectively.

In its current form, the query could take several seconds to run in a table with millions of rows. Since it runs every two minutes (10 minutes for confirm_scheduled_posts), it sometimes pollutes the slow query logs for some customers. The original query looks like this:

SELECT ID FROM {$wpdb->posts} WHERE post_status = 'future' AND post_date <= %s LIMIT 0,100;

When tested on a table with over 19M rows, this query had an estimated cost of 11411403.00 and relied on the type_status_date index. However, it didn't fully take advantage of the index's structure. The access type for the main query was index, meaning a large portion of the index was still being scanned, with a filtering efficiency of only 3.33% (when MySQL uses index as the access type, it means that instead of scanning the table's rows directly, it reads the entire index sequentially. But this differs from range or ref, which selectively scan parts of an index based on conditions). This inefficiency stemmed from applying the filters for post_status and post_date too late, leading to unnecessary overhead and slow performance.

The optimized query introduces a subquery to retrieve distinct post_type values:

SELECT DISTINCT post_type FROM {$wpdb->posts}

This result is then joined with the main table (wp_posts) to filter rows more effectively. In the test, this reduces the query cost to 1316524.20 (an 88% improvement). The subquery creates a temporary table with only the distinct post_type values, scanning just 3,483 rows. The main query then uses a ref access type with the type_status_date index, focusing only on rows where post_status and post_date match the criteria. This drops the number of rows examined per scan to 3,577 and improves filtering efficiency to 33.33%, which is significantly faster and more efficient.

This improvement is primarily due to better use of the type_status_date index, which includes post_type, post_status, post_date, and ID in that order. By using a subquery to pre-filter all possible post_type values, the main query avoids scanning irrelevant rows from the index. Additionally, applying the post_status filter earlier in the main query leverages the index's second column, enabling MySQL to filter rows more effectively and reduce unnecessary scans.

The result is a query that performs much better on large datasets, cutting down resource usage and execution time without changing its functionality. In testing, the execution time for the query on the same 19 million-row table dropped from 4.51s to just 0.001s. This change makes the functionality more scalable on tables with tens of millions of rows, delivering a significant performance boost and reducing the load on customer databases.

I also realized test cases didn't exist for the force_publish_missed_schedules and confirm_scheduled_posts functions, so I added them as part of this PR.

WPprodigy

Thanks for this! Is a fascinating performance improvement.

Optimize force_publish_missed_schedules query

bab2d13

rbcorrales requested review from rinatkhaziev and WPprodigy December 14, 2024 16:49

Fix unrelated test case issue

6f75301

rinatkhaziev added the [Status] Needs Review label Dec 16, 2024

Optimize confirm_scheduled_posts query

935acff

rbcorrales changed the title ~~Optimize force_publish_missed_schedules query~~ Optimize force_publish_missed_schedules and confirm_scheduled_posts queries Dec 18, 2024

rinatkhaziev added 2 commits December 20, 2024 12:46

Merge branch 'main' into missed-schedules-optimization

927306d

Merge branch 'main' into missed-schedules-optimization

5f9e6c1

WPprodigy approved these changes Jan 13, 2025

View reviewed changes

WPprodigy merged commit 7e32816 into main Jan 13, 2025
14 of 22 checks passed

WPprodigy deleted the missed-schedules-optimization branch January 13, 2025 22:14

WPprodigy mentioned this pull request Jan 14, 2025

Update Cron Control publish queries Automattic/vip-go-mu-plugins#6097

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize force_publish_missed_schedules and confirm_scheduled_posts queries #376

Optimize force_publish_missed_schedules and confirm_scheduled_posts queries #376

rbcorrales commented Dec 14, 2024 •

edited

Loading

WPprodigy left a comment

Optimize force_publish_missed_schedules and confirm_scheduled_posts queries #376

Optimize force_publish_missed_schedules and confirm_scheduled_posts queries #376

Conversation

rbcorrales commented Dec 14, 2024 • edited Loading

WPprodigy left a comment

Choose a reason for hiding this comment

rbcorrales commented Dec 14, 2024 •

edited

Loading