Supporting fixed number of training/testing episodes? #5071

peterhcyuen · 2019-06-30T06:02:31Z

Is this framework supporting to have a fixed number of training/testing episodes? As I added a stop criteria when running tune.run() method, for example, stop={"episodes_total": 100}, but the final result showed that it ran for more than 100 episodes.

ericl · 2019-07-01T07:48:57Z

It will stop once the number of episodes exceeds that threshold. There is no way to do an exact stop, but the approximate value should be good enough.

stale · 2020-11-15T06:40:08Z

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

stale · 2020-11-29T07:18:18Z

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

## Why are these changes needed? Make deployment retry count configurable through environment variable ## Related issue number This PR addresses #5071 Since i did not find any references to this behavior in the public doc, decided not to update any `docs`, let me know if that's not true. - Testing Strategy ### updated unit tests ### manual test 1. create a simple application ``` import logging import requests from fastapi import FastAPI from ray import serve fastapi = FastAPI() logger = logging.getLogger("ray.serve") @serve.deployment(name="fastapi-deployment", num_replicas=2) @serve.ingress(fastapi) class FastAPIDeployment: def __init__(self): self.counter = 0 raise Exception("test") # FastAPI automatically parses the HTTP request. @fastapi.get("/hello") def say_hello(self, name: str) -> str: logger.info("Handling request!") return f"Hello {name}!" my_app = FastAPIDeployment.bind() ``` 2. ran the application from local cli ``` MAX_PER_REPLICA_RETRY_MULTIPLIER=1 serve run test:my_app ``` 3. from the logs i can see that we are only retrying one instead of the default `3` https://gist.github.com/abrarsheikh/e85e00bb94ba443f76f77220b6ace530 since my app contain 2 replicas, the code retrying 2 * 1 times as expected. 4. running without overriding the env variable `serve run test:my_app` retries 6 times. --------- Signed-off-by: Abrar Sheikh <abrar2002as@gmail.com> Signed-off-by: Abrar Sheikh <abrar@abrar-FK79L5J97K.local> Co-authored-by: Saihajpreet Singh <c-saihajpreet.singh@anyscale.com> Co-authored-by: Abrar Sheikh <abrar@abrar-FK79L5J97K.local>

## Why are these changes needed? Make deployment retry count configurable through environment variable ## Related issue number This PR addresses #5071 Since i did not find any references to this behavior in the public doc, decided not to update any `docs`, let me know if that's not true. - Testing Strategy ### updated unit tests ### manual test 1. create a simple application ``` import logging import requests from fastapi import FastAPI from ray import serve fastapi = FastAPI() logger = logging.getLogger("ray.serve") @serve.deployment(name="fastapi-deployment", num_replicas=2) @serve.ingress(fastapi) class FastAPIDeployment: def __init__(self): self.counter = 0 raise Exception("test") # FastAPI automatically parses the HTTP request. @fastapi.get("/hello") def say_hello(self, name: str) -> str: logger.info("Handling request!") return f"Hello {name}!" my_app = FastAPIDeployment.bind() ``` 2. ran the application from local cli ``` MAX_PER_REPLICA_RETRY_MULTIPLIER=1 serve run test:my_app ``` 3. from the logs i can see that we are only retrying one instead of the default `3` https://gist.github.com/abrarsheikh/e85e00bb94ba443f76f77220b6ace530 since my app contain 2 replicas, the code retrying 2 * 1 times as expected. 4. running without overriding the env variable `serve run test:my_app` retries 6 times. --------- Signed-off-by: Abrar Sheikh <abrar2002as@gmail.com> Signed-off-by: Abrar Sheikh <abrar@abrar-FK79L5J97K.local> Co-authored-by: Saihajpreet Singh <c-saihajpreet.singh@anyscale.com> Co-authored-by: Abrar Sheikh <abrar@abrar-FK79L5J97K.local> Signed-off-by: Ricardo Decal <rdecal@anyscale.com>

## Why are these changes needed? Make deployment retry count configurable through environment variable ## Related issue number This PR addresses ray-project#5071 Since i did not find any references to this behavior in the public doc, decided not to update any `docs`, let me know if that's not true. - Testing Strategy ### updated unit tests ### manual test 1. create a simple application ``` import logging import requests from fastapi import FastAPI from ray import serve fastapi = FastAPI() logger = logging.getLogger("ray.serve") @serve.deployment(name="fastapi-deployment", num_replicas=2) @serve.ingress(fastapi) class FastAPIDeployment: def __init__(self): self.counter = 0 raise Exception("test") # FastAPI automatically parses the HTTP request. @fastapi.get("/hello") def say_hello(self, name: str) -> str: logger.info("Handling request!") return f"Hello {name}!" my_app = FastAPIDeployment.bind() ``` 2. ran the application from local cli ``` MAX_PER_REPLICA_RETRY_MULTIPLIER=1 serve run test:my_app ``` 3. from the logs i can see that we are only retrying one instead of the default `3` https://gist.github.com/abrarsheikh/e85e00bb94ba443f76f77220b6ace530 since my app contain 2 replicas, the code retrying 2 * 1 times as expected. 4. running without overriding the env variable `serve run test:my_app` retries 6 times. --------- Signed-off-by: Abrar Sheikh <abrar2002as@gmail.com> Signed-off-by: Abrar Sheikh <abrar@abrar-FK79L5J97K.local> Co-authored-by: Saihajpreet Singh <c-saihajpreet.singh@anyscale.com> Co-authored-by: Abrar Sheikh <abrar@abrar-FK79L5J97K.local>

ericl added the question Just a question :) label Jul 1, 2019

stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 15, 2020

stale bot closed this as completed Nov 29, 2020

abrarsheikh mentioned this issue Feb 27, 2025

make replica retry count configurable #50960

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting fixed number of training/testing episodes? #5071

Supporting fixed number of training/testing episodes? #5071

peterhcyuen commented Jun 30, 2019

ericl commented Jul 1, 2019

stale bot commented Nov 15, 2020

stale bot commented Nov 29, 2020

Supporting fixed number of training/testing episodes? #5071

Supporting fixed number of training/testing episodes? #5071

Comments

peterhcyuen commented Jun 30, 2019

ericl commented Jul 1, 2019

stale bot commented Nov 15, 2020

stale bot commented Nov 29, 2020