-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ETA times go up instead of down #580
Comments
This seems like a clear case where The issue you're seeing here is that indicatif heavily prioritizes data from the most recent 15 seconds, under the assumption that for most use cases changes that happen on order of 15 seconds are significant. When you enable the steady tick, the automatic tick revises the estimate every time, so if you go 5 seconds without any new data that's a fairly significant event as far as the algorithm is concerned. Making improvements to this approach is #556 but I think you'd be fine without a steady tick here. Incidentally, indicatif makes the assumption that steps (the thing you increment) are the basic unit of progress. So until a step occurs, indicatif assumes that progress is stalled, meaning that an increased estimate makes sense. In your case this isn't really true, progress is constantly happening but you only add a step when a benchmark is completed. Indicatif doesn't try to guess how much progress is "secretly" happening in the background in between steps, it just assumes no progress has happened. For e.g. network transfers of bytes, this is a safe assumption. For other use cases, maybe not. |
Then how do I show up-to-date ETA information to the user? How do I indicate progress by updating the spinner (not included in the minimal example above)?
To be honest, it feels wrong to me to have a hard-coded time-scale in a general-purpose progress bar library.
Why? Is
This seems like a very strong assumption. Isn't "we will advance the progress bar at approximately equidistant time intervals" the most common use case? In that scenario, it would be more reasonable to assume that
Maybe that's just me, but "network transfer of bytes" with irregularly appearing stalls of progress seems like a much more exotic use case compared to: we |
I think this is effectively a duplicate of #556. I would like us to do better here but it's been hard to prioritize for me. I agree the steady ticker should make sense here and we should probably tweak our ETA calculation again. It's just hard to come up with good heuristics to cover all the different use cases -- any help would be appreciated. |
I understand that you would like to avoid adding new parameters if possible. But I think it's not unreasonable to have a /// Set this parameter to improve the ETA calculation by providing
/// prior knowledge about the expected timing of `.inc` calls.
enum EtaHint {
/// Progress is expected to happen at a roughly constant frequency, e.g. every X milliseconds.
/// If the (rough) duration between `.inc` calls is known, it can be specified to even get ETA
/// estimates before the first `.inc` call.
Regular(Option<Duration>),
/// Progress can be erratic (e.g. transfer of bytes over a network connection). Assume that
/// progress is currently stalled if no `.inc` has happened within the specified time duration.
Erratic(Duration),
} |
I would like to see more proof that we can't come up with a better heuristic before we add configuration for this. It's easy for everyone to suggest configuration that serves their own use cases but overall worse for the ergonomics of using the crate. |
Fair point. I'm not familiar with the math/algorithm behind the ETA estimation. But I would argue that anything that involves a concrete, hard-coded time scale (the 15 seconds mentioned above) will turn So if you want to find an general-purpose algorithm without any parameters that supports all possible use cases, it can not involve a hard-coded time scale. Instead, it can only look at the timing deltas between Note: I'm not saying that an estimator with a hard-coded time scale is a bad idea. But it can't solve all use cases (without being parametrizable) because it always breaks a (scaling) symmetry in the time dimension.
I don't know. An additional parameter is always a liability in terms of API stability (and increases code complexity), but I don't think it necessarily makes this crate more complex to use. There could still be a reasonable default setting. |
I came across this bug in latest version hyperfine (benchmarking tool). With long cycle tymes (60+ seconds) 10 minutes ETA goes up to several days. After each tick it jumps to reasonable value. This example goes up to 30 days ETA.
This behavior was introdused in 011c998 |
Here is another example, video encoding. The eta is 13h then increases up to 8d until ticking back to 13h and repeating. long-encode.webmWith set_position logging, we can see the eta increases while the progress integer remains the same and ticks back when the integer increments. long-encode-debug-logging.webm |
Yes, I don't think we need more evidence that there is an issue, just someone willing to invest effort to research making it better. Unfortunately I'm unlikely to have time to do that anytime soon.
|
Ok. How about we just revert it, untill better solution found. PR #620 |
I did some thinking about this issue today. The primary thing I don't want to break is stall detection. The estimated time remaining should approach infinity exponentially if no updates are received indefinitely; that is a basic and correct behavior of the EWA. (If you unplug your modem, a data transfer will never complete.) So I decided to try the following idea: keep an unbiased estimate of the maximum delay between updates, and set the weight to a multiple of that value when it exceeds the default weighting (15 seconds). I chose the multiple 6.6 because that means that if progress stalls for 2x the estimated maximum delay between updates, the ETA will double. This strikes a good balance between fluctuating too much and requiring too long to report a stall, IMO. * Obviously this is still not going to be ideal for everyone. Something that's very important to emphasize is that if you don't want live updates on your ETA, which necessarily increase when progress isn't made, you should not be using The one downside of this approach I can think of is that it seriously complicates the estimate from a mathematical perspective. If we kept every single tick stored internally, we could correctly reweight all the old data whenever the tuning parameter gets changed, but we don't (and can't) store all that data. So with this change indicatif will effectively be using old data with the "wrong" weights in its new estimates, so I expect estimates to only asymptotically approach correct behavior. It's possible that pathological behavior could occur when progress occurs in unexpected patterns - test cases are definitely needed here. Please check out my branch and run it with your use cases and let me know if you can find behavior that seems completely unreasonable, especially if the same test case works fine with the current indicatif release. https://github.com/afontenot/indicatif/tree/dynamic-estimator-reweighting * There's an unfortunate corner case. If your Internet goes down for 12 hours, then comes back and the transfer resumes, a new stall won't be adequately reported in the estimate unless progress stops for a day or more. I don't know that this is "wrong" exactly, but it's potentially frustrating. |
@afontenot sounds promising! Would be great to get more feedback from people who are suffering from this issue. |
Maybe this problem could be resolved with a head-on approach.
|
Thank you for providing and maintaining this crate.
We have the following use case: we run benchmarks where each iteration takes the same time (approximately), but the iteration time can be anything from a few milliseconds to several hours. Especially for longer-running benchmarks (> a few seconds), we want to show an ETA to the user.
Recently, we noticed a regression in the ETA computation after updating to indicatif 0.17.5 (but the behavior wasn't exactly great before either..). It can be demonstrated with the following program (which simply ticks the progress bar
count
times, every five seconds):The output looks like this (ignore spacing issues, the ETA is what I'm looking at):
After five seconds, the ETA jumps to 00:00:05, which is expected (there is one more iteration, the first took 5 seconds). But then, instead of continuously counting down to zero, the time goes up to almost 10 seconds, shortly before the second iteration finishes.
The behavior becomes even worse when
count
is larger than 2 or if the iteration time is larger.This looks very related to #556. I wasn't sure if I should comment there, so feel free to close this as a duplicate if it's the exact same thing.
See also: previous issues reported by me with the same use case, both resolved at the time: #122 #81.
The text was updated successfully, but these errors were encountered: