-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with the index on nextrun
#106
Comments
We dropped the index The RDS CPU usage dropped from ~80% to 4% immediately after dropping the index - And cavalcade jobs execution throughput can be seen in this screenshot, This was executing ~130 jobs per hours from this it executed 1.5K jobs in ~30 minutes, only change in between was the dropped index - |
Thanks for the detailed ticket and graphs folks. The new indexes where added as a follow up to #75 as more complex queries were been made as the site loaded. The theory was that it would speed them up. Evidently this turned out to be incorrect for larger sites or once the jobs table grew to include a number of failed and completed jobs. As Cavalcade is designed for larger sites, I am happy if it's removed. As a completely ignorant question, does the order of the SELECT *
FROM wp_cavalcade_jobs
WHERE site = 19
AND hook = 'batch_export_pages_0'
AND args = '<SERIALIZED-ARRAY>'
AND status IN( 'waiting', 'running' )
AND nextrun BETWEEN '1970-01-01 00:00:00' AND '2020-11-27 08:48:16'
ORDER BY nextrun ASC
LIMIT 1
; Also, I gather the above is run while scheduling a single task within the next ten minutes or a zero timestamp. The time clause when scheduling such tasks could be simplified too: SELECT *
FROM wp_cavalcade_jobs
WHERE site = 19
AND hook = 'batch_export_pages_0'
AND args = '<SERIALIZED-ARRAY>'
AND status IN( 'waiting', 'running' )
AND nextrun <= '2020-11-27 08:48:16'
ORDER BY nextrun ASC
LIMIT 1 |
I haven't tested, but I think this wouldn't make a difference, as the In general, MySQL will pick the index that means to search the smallest number of rows. Looking at the Cavalcade jobs table, this can be different across projects, though. As I mentioned before, it would be possible to use But I am still not convinved the From that same project, this is the TOP 5 of queries over the last five days (other than the above one):
None of them would make good use of the Would be good to also get some input from either @rmccue or @roborourke... |
Thanks for this in depth analysis @tfrommen and @Nikschavan. This seems like an overwhelming argument to remove the index. Let's make that happen! |
Agreed, let's drop it. |
Thanks @tfrommen - this is now available in v2.0.2 |
Thanks folks, sorry I broke your product.
|
@peterwilsoncc I'll send you the invoice. |
So I upgraded from 2.0.1 to 2.0.2 (using composer), and I'm still seeing the |
@archon810 you would need to run |
@tfrommen Thank you, that did it. What other ways trigger the upgrade routine? Shouldn't it run automatically at some point after the update? This is probably something that should be in release notes, otherwise nobody's going to get the benefits of the db change unless they know to run the upgrade command (and things would be a lot more disastrous if the db structure actually contained breaking changes). |
@archon810 It doesn't run automatically as the upgrade operation can affect the database in load-affecting ways, such as locking writes or just doing a lot of writes. Cavalcade is designed to continue operating with an non-upgraded database (similarly to WordPress). We can probably do a better job of documenting this though. :) |
Updated the release notes for 2.0.2. |
In #95 (👋 @peterwilsoncc 🙂), several new indexes on the jobs table were introduced, one of them being
nextrun
, on the column with the same name.I would like to question the benefit of having said index, and, in fact, illustrate why I think it actually is a performance burden than anything else.
Context
Firstly, the
nextrun
column is of typedatetime
.Given the nature of the data, that is, information about cron tasks, it is highly unlikely to have several rows with the exact same timestamp. Sure, for high-traffic sites with lots of scheduled tasks, and/or bugs (or just logical errors) in the code scheduling said tasks, you might end up with a few tasks being scheduled for the exact same time.
In reality, however, this is not often the case. And even if it was, I assume we would be talking about less than 10 and definitely less than 100 tasks. Nothing that an index would make a meaningful difference in.
Since no other index covers the
nextrun
column, MySQL (or whatever your DB is) will try and make good use of the dedicatednextrun
index, whenever thenextrun
column is used for querying and/or sorting/grouping etc.Problem
In theory, the
nextrun
index is a one-to-many mapping. However, in practice, it will most probably only exist as one-to-one (or maybe one-to-few) map, and thus one level of unnecessary indirection.I have not checked all (core) functionality or rather the respective queries including
nextrun
, but pretty much all usage of theJob::get_jobs_by_query
method that results in MySQL queries includingnextrun BETWEEN ... AND ...
ornextrun < ...
, will mean a huge performance burden.Here is a query taken from a live project:
Given a table with 1.7 million jobs, executing the above query takes about 4 seconds!
The
1970-01-01 00:00:00
timestamp—the Beginning of Time 😉—is due to using0
in case when an event's timestamp is more than 10 minutes in the past. However, the date range does not really pose an issue, and using something like2020-01-01 00:00:00
instead will perform just as badly.Possible Solutions
I see, at least, two ways to go about this:
If we can agree that the index is, in fact, useless, given the nature of time, let's just delete it.
If, on the other hand, there is something to gain by having the index—I'm looking forward to learning what that would look like—I suggest to ignore the index for queries that pose problematic. This can be done by enhancing
Job::get_jobs_by_query
to addIGNORE INDEX (nextrun)
to the query for a non-integer timestamp, that is, the'past'
string, or a range of timestamps.Granted, there is a third option, which is to regularly truncate (or clean up) the jobs table. Oftentimes, however, this is not possible or desired.
Data
Here is some real-world data, underlining the above suggestion of ignoring/deleting the index.
For all queries, I ran
RESET QUERY CACHE;
right before them, to not have differing results due to cached data.Given a jobs table with almost 1 million rows (960,764), will execute the above query as follows:
As stated, this will search a total of 480,467 rows, which is about half the size of the whole table!
Execution, on the project's staging server, took 4.33114400 seconds.
This is the detailed profile (using
SET profiling = 1
andSHOW PROFILE FOR QUERY #
):The exact same query, now using
IGNORE INDEX (nextrun)
, will be executed like so:This is using filesort, and/but will only search a total of 448 rows! In this example, this is even less than 1% of the previous query using the index (to no avail).
The profile looks as follows:
Executing the query, overall, takes 0.00305075 seconds, or 3 milliseconds, which is less than 0.1% of the previous query execution time.
Thoughts?
I'm happy to hear anyone's thoughts on the current state, the problem that I explained, as well as any kind of possible solution! Especially interested to hear @peterwilsoncc's reasons for introducing the
nextrun
index. 🙂The text was updated successfully, but these errors were encountered: