Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize force_publish_missed_schedules and confirm_scheduled_posts queries #376

Merged
merged 5 commits into from
Jan 13, 2025

Conversation

rbcorrales
Copy link
Member

@rbcorrales rbcorrales commented Dec 14, 2024

This update optimizes the queries used by the force_publish_missed_schedules and confirm_scheduled_posts functions, triggered by the a8c_cron_control_force_publish_missed_schedules and a8c_cron_control_confirm_scheduled_posts internal events, respectively.

In its current form, the query could take several seconds to run in a table with millions of rows. Since it runs every two minutes (10 minutes for confirm_scheduled_posts), it sometimes pollutes the slow query logs for some customers. The original query looks like this:

SELECT ID FROM {$wpdb->posts} WHERE post_status = 'future' AND post_date <= %s LIMIT 0,100;

When tested on a table with over 19M rows, this query had an estimated cost of 11411403.00 and relied on the type_status_date index. However, it didn't fully take advantage of the index's structure. The access type for the main query was index, meaning a large portion of the index was still being scanned, with a filtering efficiency of only 3.33% (when MySQL uses index as the access type, it means that instead of scanning the table's rows directly, it reads the entire index sequentially. But this differs from range or ref, which selectively scan parts of an index based on conditions). This inefficiency stemmed from applying the filters for post_status and post_date too late, leading to unnecessary overhead and slow performance.

The optimized query introduces a subquery to retrieve distinct post_type values:

SELECT DISTINCT post_type FROM {$wpdb->posts}

This result is then joined with the main table (wp_posts) to filter rows more effectively. In the test, this reduces the query cost to 1316524.20 (an 88% improvement). The subquery creates a temporary table with only the distinct post_type values, scanning just 3,483 rows. The main query then uses a ref access type with the type_status_date index, focusing only on rows where post_status and post_date match the criteria. This drops the number of rows examined per scan to 3,577 and improves filtering efficiency to 33.33%, which is significantly faster and more efficient.

This improvement is primarily due to better use of the type_status_date index, which includes post_type, post_status, post_date, and ID in that order. By using a subquery to pre-filter all possible post_type values, the main query avoids scanning irrelevant rows from the index. Additionally, applying the post_status filter earlier in the main query leverages the index's second column, enabling MySQL to filter rows more effectively and reduce unnecessary scans.

The result is a query that performs much better on large datasets, cutting down resource usage and execution time without changing its functionality. In testing, the execution time for the query on the same 19 million-row table dropped from 4.51s to just 0.001s. This change makes the functionality more scalable on tables with tens of millions of rows, delivering a significant performance boost and reducing the load on customer databases.

I also realized test cases didn't exist for the force_publish_missed_schedules and confirm_scheduled_posts functions, so I added them as part of this PR.

@rbcorrales rbcorrales changed the title Optimize force_publish_missed_schedules query Optimize force_publish_missed_schedules and confirm_scheduled_posts queries Dec 18, 2024
Copy link
Contributor

@WPprodigy WPprodigy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Is a fascinating performance improvement.

@WPprodigy WPprodigy merged commit 7e32816 into main Jan 13, 2025
14 of 22 checks passed
@WPprodigy WPprodigy deleted the missed-schedules-optimization branch January 13, 2025 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants