-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting fixed number of training/testing episodes? #5071
Comments
It will stop once the number of episodes exceeds that threshold. There is no way to do an exact stop, but the approximate value should be good enough. |
Hi, I'm a bot from the Ray team :) To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months. If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel. |
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message. Please feel free to reopen or open a new issue if you'd still like it to be addressed. Again, you can always ask for help on our discussion forum or Ray's public slack channel. Thanks again for opening the issue! |
## Why are these changes needed? Make deployment retry count configurable through environment variable ## Related issue number This PR addresses #5071 Since i did not find any references to this behavior in the public doc, decided not to update any `docs`, let me know if that's not true. - Testing Strategy ### updated unit tests ### manual test 1. create a simple application ``` import logging import requests from fastapi import FastAPI from ray import serve fastapi = FastAPI() logger = logging.getLogger("ray.serve") @serve.deployment(name="fastapi-deployment", num_replicas=2) @serve.ingress(fastapi) class FastAPIDeployment: def __init__(self): self.counter = 0 raise Exception("test") # FastAPI automatically parses the HTTP request. @fastapi.get("/hello") def say_hello(self, name: str) -> str: logger.info("Handling request!") return f"Hello {name}!" my_app = FastAPIDeployment.bind() ``` 2. ran the application from local cli ``` MAX_PER_REPLICA_RETRY_MULTIPLIER=1 serve run test:my_app ``` 3. from the logs i can see that we are only retrying one instead of the default `3` https://gist.github.com/abrarsheikh/e85e00bb94ba443f76f77220b6ace530 since my app contain 2 replicas, the code retrying 2 * 1 times as expected. 4. running without overriding the env variable `serve run test:my_app` retries 6 times. --------- Signed-off-by: Abrar Sheikh <abrar2002as@gmail.com> Signed-off-by: Abrar Sheikh <abrar@abrar-FK79L5J97K.local> Co-authored-by: Saihajpreet Singh <c-saihajpreet.singh@anyscale.com> Co-authored-by: Abrar Sheikh <abrar@abrar-FK79L5J97K.local>
## Why are these changes needed? Make deployment retry count configurable through environment variable ## Related issue number This PR addresses #5071 Since i did not find any references to this behavior in the public doc, decided not to update any `docs`, let me know if that's not true. - Testing Strategy ### updated unit tests ### manual test 1. create a simple application ``` import logging import requests from fastapi import FastAPI from ray import serve fastapi = FastAPI() logger = logging.getLogger("ray.serve") @serve.deployment(name="fastapi-deployment", num_replicas=2) @serve.ingress(fastapi) class FastAPIDeployment: def __init__(self): self.counter = 0 raise Exception("test") # FastAPI automatically parses the HTTP request. @fastapi.get("/hello") def say_hello(self, name: str) -> str: logger.info("Handling request!") return f"Hello {name}!" my_app = FastAPIDeployment.bind() ``` 2. ran the application from local cli ``` MAX_PER_REPLICA_RETRY_MULTIPLIER=1 serve run test:my_app ``` 3. from the logs i can see that we are only retrying one instead of the default `3` https://gist.github.com/abrarsheikh/e85e00bb94ba443f76f77220b6ace530 since my app contain 2 replicas, the code retrying 2 * 1 times as expected. 4. running without overriding the env variable `serve run test:my_app` retries 6 times. --------- Signed-off-by: Abrar Sheikh <abrar2002as@gmail.com> Signed-off-by: Abrar Sheikh <abrar@abrar-FK79L5J97K.local> Co-authored-by: Saihajpreet Singh <c-saihajpreet.singh@anyscale.com> Co-authored-by: Abrar Sheikh <abrar@abrar-FK79L5J97K.local> Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
## Why are these changes needed? Make deployment retry count configurable through environment variable ## Related issue number This PR addresses ray-project#5071 Since i did not find any references to this behavior in the public doc, decided not to update any `docs`, let me know if that's not true. - Testing Strategy ### updated unit tests ### manual test 1. create a simple application ``` import logging import requests from fastapi import FastAPI from ray import serve fastapi = FastAPI() logger = logging.getLogger("ray.serve") @serve.deployment(name="fastapi-deployment", num_replicas=2) @serve.ingress(fastapi) class FastAPIDeployment: def __init__(self): self.counter = 0 raise Exception("test") # FastAPI automatically parses the HTTP request. @fastapi.get("/hello") def say_hello(self, name: str) -> str: logger.info("Handling request!") return f"Hello {name}!" my_app = FastAPIDeployment.bind() ``` 2. ran the application from local cli ``` MAX_PER_REPLICA_RETRY_MULTIPLIER=1 serve run test:my_app ``` 3. from the logs i can see that we are only retrying one instead of the default `3` https://gist.github.com/abrarsheikh/e85e00bb94ba443f76f77220b6ace530 since my app contain 2 replicas, the code retrying 2 * 1 times as expected. 4. running without overriding the env variable `serve run test:my_app` retries 6 times. --------- Signed-off-by: Abrar Sheikh <abrar2002as@gmail.com> Signed-off-by: Abrar Sheikh <abrar@abrar-FK79L5J97K.local> Co-authored-by: Saihajpreet Singh <c-saihajpreet.singh@anyscale.com> Co-authored-by: Abrar Sheikh <abrar@abrar-FK79L5J97K.local>
Is this framework supporting to have a fixed number of training/testing episodes? As I added a stop criteria when running tune.run() method, for example, stop={"episodes_total": 100}, but the final result showed that it ran for more than 100 episodes.
The text was updated successfully, but these errors were encountered: