uniqueness does not work at scale #446

elee1766 · 2024-07-11T23:47:54Z

This issue is a followup to #346

@brandur gave me this recommendation

In your case, an alternative: drop the uniqueness checks and then implement your job such that it checks on start up the last time its data was updated. If the update was very recent, it falls through with a no op. So you'd still be inserting lots of jobs, but most of them wouldn't be doing any work, and you wouldn't suffer the unique performance penalty.

however, this solution currently schedules hundreds of rps across our clusters, which is causing a lot of extra load, across all the job logic + the notifier.

more importantly, we have ~200-400k unique units of work every hour or so, but we would really like these things to be done every 15 minutes. without a uniqueness filter, it schedules millions of units of work every hour that while they do end up getting deduplicated at work-time, at the expense of large amounts of db work that ends up slowing down other calculations and other routines, which causes a vicious cycle of more jobs not getting completed, and more jobs piling up.

a side effect is this also causes is that the few places where we do schedule unique to become very slow, and so we basically can't use the unique feature in any jobs without fear those scheduling operations taking multiple seconds because of all the operations currently going on in the jobs table.

we could move river to a separate postgres cluster, but at that point, we would migrate away from river, because the advantage of it running in the same database as our data is gone.

for now we are likely going to implement our own hooks on top of the existing river client using inserttx to not schedule tasks when we dont need to - but it really feels like a weakness of river's unique insert feature. i'm still not really sure who it's for, since it can't scale to any reasonable throughput, and also is missing a good amount of features that come standard in other work queues (the most obvious that comes to mind is being able to do subset of args).

it would be really nice if there was some sort of uniqueness mechanism that didn't use advisory locks, for instance, a nullable unique column with a user-definable id on input in the jobs column immediately comes to mind. this would allow me to de-duplicate tasks by a subset of arguments and time interval/sequence id, which is more than enough for me.

The text was updated successfully, but these errors were encountered:

brandur · 2024-07-13T00:15:18Z

I'm going to look into this, but although we can speed it up, I'm a bit worried that it'll be hard to get to something that works well for you — it sounds like your app is fundamentally churning through so much work that a very busy DB will be somewhat inevitable.

brandur · 2024-07-13T03:43:32Z

Opened #451. Should make unique insertions something like 20-45x faster as long as you stay within the default set of unique states.

bgentry · 2024-07-26T15:21:31Z

I think the changes in #451 (shipped in v0.10.0) are a massive improvement on unique job performance, if you can stay within the bounds of that happy path. Let us know how it goes if you give it a try! 🙏

elee1766 · 2024-07-26T22:48:38Z

super excited. we are in this happy path, so i expect it to speed up our scheduling by a lot.

@bgentry this is a little off topic maybe, but how do you recommend people do long term job metrics?

do you think should we write something that views the river jobs table and exports prometheus metrics (like river-prometheus-exporter), or instrument our workers similar to how we instrument tracing (wrapping work functions with tracing instrumentation)

im not too sure which was the vision you had for river - so we havent made a move here yet.

bgentry · 2024-07-26T23:47:17Z

My 100% recommendation is to instrument the workers or use the client subscriptions to do this kind of metrics work. As your job table grows then scanning it in any way other than the exact queries used by River are going to have severe performance impacts.

bgentry closed this as completed Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uniqueness does not work at scale #446

uniqueness does not work at scale #446

elee1766 commented Jul 11, 2024

brandur commented Jul 13, 2024

brandur commented Jul 13, 2024

bgentry commented Jul 26, 2024

elee1766 commented Jul 26, 2024 •

edited

Loading

bgentry commented Jul 26, 2024

uniqueness does not work at scale #446

uniqueness does not work at scale #446

Comments

elee1766 commented Jul 11, 2024

brandur commented Jul 13, 2024

brandur commented Jul 13, 2024

bgentry commented Jul 26, 2024

elee1766 commented Jul 26, 2024 • edited Loading

bgentry commented Jul 26, 2024

elee1766 commented Jul 26, 2024 •

edited

Loading