Optimize force_publish_missed_schedules and confirm_scheduled_posts queries #376
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This update optimizes the queries used by the
force_publish_missed_schedules
andconfirm_scheduled_posts
functions, triggered by thea8c_cron_control_force_publish_missed_schedules
anda8c_cron_control_confirm_scheduled_posts
internal events, respectively.In its current form, the query could take several seconds to run in a table with millions of rows. Since it runs every two minutes (10 minutes for
confirm_scheduled_posts
), it sometimes pollutes the slow query logs for some customers. The original query looks like this:When tested on a table with over 19M rows, this query had an estimated cost of
11411403.00
and relied on thetype_status_date
index. However, it didn't fully take advantage of the index's structure. The access type for the main query wasindex
, meaning a large portion of the index was still being scanned, with a filtering efficiency of only3.33%
(when MySQL usesindex
as the access type, it means that instead of scanning the table's rows directly, it reads the entire index sequentially. But this differs fromrange
orref
, which selectively scan parts of an index based on conditions). This inefficiency stemmed from applying the filters forpost_status
andpost_date
too late, leading to unnecessary overhead and slow performance.The optimized query introduces a subquery to retrieve distinct
post_type
values:This result is then joined with the main table (
wp_posts
) to filter rows more effectively. In the test, this reduces the query cost to1316524.20
(an88%
improvement). The subquery creates a temporary table with only the distinctpost_type
values, scanning just3,483
rows. The main query then uses aref
access type with thetype_status_date
index, focusing only on rows wherepost_status
andpost_date
match the criteria. This drops the number of rows examined per scan to3,577
and improves filtering efficiency to33.33%
, which is significantly faster and more efficient.This improvement is primarily due to better use of the
type_status_date
index, which includespost_type
,post_status
,post_date
, andID
in that order. By using a subquery to pre-filter all possiblepost_type
values, the main query avoids scanning irrelevant rows from the index. Additionally, applying thepost_status
filter earlier in the main query leverages the index's second column, enabling MySQL to filter rows more effectively and reduce unnecessary scans.The result is a query that performs much better on large datasets, cutting down resource usage and execution time without changing its functionality. In testing, the execution time for the query on the same 19 million-row table dropped from
4.51s
to just0.001s
. This change makes the functionality more scalable on tables with tens of millions of rows, delivering a significant performance boost and reducing the load on customer databases.I also realized test cases didn't exist for the
force_publish_missed_schedules
andconfirm_scheduled_posts
functions, so I added them as part of this PR.