-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timeouts to queue messages #3196
Conversation
All the demos for this PR have been deployed at https://huggingface.co/spaces/gradio-pr-deploys/pr-3196-all-demos |
48c192c
to
d3abf13
Compare
Isn't five seconds a really long time? |
@pngwn Yea we can do 1 second? Sending the message should be fast but don't want to set something too low that would impact people on slow networks. |
Just getting a chance to look at this. Very cool hunting down the issue. What you're proposing is reasonable, although I worry that a 5 second timeout might actually be too little given the fact that these websocket messages could contain large files particularly if multiple images are involved, which are sent as base64 strings Also, it is a very good sign that:
But currently the other Spaces from the issue such as:
are also running fine. Have you seen these Spaces build up a large queue while the others haven't? |
@abidlabs Thanks for the review! Good point about the latency for large images. Let me toy with setting variable timeouts for the different messages. I think the getting the data and sending the result should have long timeouts (a minute) but the other message payloads are quite small and I think a second will suffice. Will ping when this is ready for review again! |
Can confirm the |
0b0415c
to
fc156fb
Compare
@abidlabs I set the That lets me upload a 44 mb image to this space: https://huggingface.co/spaces/gradio/image_mod and not get a timeout. I think 60 seconds is reasonable since I don't think total image payloads above 50mb will be common. And for other file types we will only send the filename when #3191 is ready. That being said, we can also set it higher. Also, users will be notified if they reached the timeout uploading their data: |
Getting some interesting behavior when testing out the deployed PR Space. The Gonna take a deeper look later today but thought I'd flag this for now @freddyaboulton |
@abidlabs I can't reproduce 🤔 |
I think this looks good. Let me just resurrect the queue benchmarking script and test the performance on this branch vs. main |
Ran the benchmarking script on this branch vs. main and here's what I found as the avg latency over 1000 runs (seconds):
Given the results, I think we can go ahead and merge |
Thank you for testing @abidlabs 🙏 Will merge once ci passes! |
* Fix + test * Remove print statements + fix import for 3.7 * CHANGELOG * Remove more print statements * Add 60 second timeout for uploading data * Fix test --------- Co-authored-by: Abubakar Abid <abubakar@huggingface.co>
* dropdown * more dropdown updates * dropdown styling + option visibility * changelog * notebook * fix test * Allow more image formats (#3225) * add wildcard to image input * simplify mime types * changelog * regen noteboks --------- Co-authored-by: Abubakar Abid <abubakar@huggingface.co> Co-authored-by: pngwn <hello@pngwn.io> * fix webcam mirroring (#3245) * fix webcam * changelog * fix changelog * fix changelog * fix changelog --------- Co-authored-by: Abubakar Abid <abubakar@huggingface.co> * Add `interactive=False` mode to `gr.Button` (#3266) * add interactive=False to button * add interactive=True by default * changelog * fix frontend * fix backend test * formatting * review changes * LaTeX height fix (#3258) * latex height fix * changelog * formatting * em * em * accidentally added script (#3273) * Adding a script to benchmark the queue (#3272) * added benchmark queue script * changelg * fix instructions * Fix matplotlib image size (#3274) * Fix matplotlib css * CHANGELOG * Undo lockfile * Add timeouts to queue messages (#3196) * Fix + test * Remove print statements + fix import for 3.7 * CHANGELOG * Remove more print statements * Add 60 second timeout for uploading data * Fix test --------- Co-authored-by: Abubakar Abid <abubakar@huggingface.co> * icons * separate options into component * formatting * changelog * changelog * fix ui tests * formatting again... * backend test fix * format * doc fixes --------- Co-authored-by: Abubakar Abid <abubakar@huggingface.co> Co-authored-by: fienestar <fienestar@gmail.com> Co-authored-by: pngwn <hello@pngwn.io> Co-authored-by: Freddy Boulton <alfonsoboulton@gmail.com>
Description
Fixes #3187
I'm not 100% sure what the cause of the original issue is. Based on what I'll explain below, I think it may be that the front-end client is dropping connections during load events that are queued. Even if that isn't the root cause, I think what I propose here is a reasonable change in behavior that does fix the problem.
These are my findings:
I noticed that the
process_events
method would get stuck in theget_message
call indefinitely. See the example print statements below:Since the
event
was already in the active jobs slot andconcurrency_count=1
by default, no other jobs would get pulled off the queue which is why it grew forever.Since I noticed that it was getting stuck on
get_message
, I figuredevent.websocket.receive_json()
was the culprit. I decided to add a 5 second time out and I haven't noticed any weirdness in https://huggingface.co/spaces/gradio/blocks_random_slider or https://huggingface.co/spaces/gradio/native_plots (I installed this PR on those spaces) . I also noticed that the timeout statements were getting printed, which means that we successfully avoid waiting for ever!Will monitor for a bit though in case the problem still persists. Although, adding a timeout seems like a reasonable thing to do anyways.
I think a five second timeout is reasonable as this just applies to sending/receiving a json payload.
I feel like this is how I sound like explaining this:
Checklist:
A note about the CHANGELOG
Hello 👋 and thank you for contributing to Gradio!
All pull requests must update the change log located in CHANGELOG.md, unless the pull request is labeled with the "no-changelog-update" label.
Please add a brief summary of the change to the Upcoming Release > Full Changelog section of the CHANGELOG.md file and include
a link to the PR (formatted in markdown) and a link to your github profile (if you like). For example, "* Added a cool new feature by
[@myusername](link-to-your-github-profile)
in[PR 11111](https://github.com/gradio-app/gradio/pull/11111)
".If you would like to elaborate on your change further, feel free to include a longer explanation in the other sections.
If you would like an image/gif/video showcasing your feature, it may be best to edit the CHANGELOG file using the
GitHub web UI since that lets you upload files directly via drag-and-drop.