Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The service crashes if the model takes a long time to respond #3168

Open
yurkoff-mv opened this issue May 31, 2024 · 12 comments
Open

The service crashes if the model takes a long time to respond #3168

yurkoff-mv opened this issue May 31, 2024 · 12 comments
Labels

Comments

@yurkoff-mv
Copy link

🐛 Describe the bug

TorchServe version is 0.10.0.
It's my code:

def get_inference_stub(address: str, port: Union[str, int]= 7070):
    channel = grpc.insecure_channel(address + ':' + str(port),
                                    options=[
                                        ('grpc.max_send_message_length', 1073741824),
                                        ('grpc.max_receive_message_length', 1073741824),
                                        ('grpc.initial_reconnect_backoff_ms', 30000)
                                    ],
                                    )
    stub = inference_pb2_grpc.InferenceAPIsServiceStub(channel)
    return stub

def infer(stub, data, model_name: str):
    data = base64.b64encode(pickle.dumps(data, protocol=3))
    input_data = {'data': data}
    try:
        response = stub.Predictions(inference_pb2.PredictionsRequest(model_name=model_name, input=input_data), timeout=330)
        prediction = pickle.loads(base64.b64decode(response.prediction))
        # prediction = pickle.loads(base64.b64decode(bytes(response.prediction, 'utf-8')))
        return prediction
    except grpc.RpcError as e:
        exit(1)

Timeouts on the client are set quite large.

My config:

inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
grpc_inference_port=7070
grpc_management_port=7071
max_request_size=1073741824
max_response_size=1073741824
install_py_dep_per_model=true
number_of_gpu=1
number_of_netty_threads=32
job_queue_size=1
async_logging=true
model_store=/home/model-server/model-store
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"vicuna-13b-16k":{"1.6":{"defaultVersion":true,"marName":"vicuna-13b-16k.mar","minWorkers":1,"maxWorkers":1,"batchSize":1,"maxBatchDelay":300000,"responseTimeout":300000}}}}

Error logs

2024-05-31T17:58:39,322 [INFO ] epollEventLoopGroup-3-12 ACCESS_LOG - /192.168.21.103:50546 "GET /models HTTP/1.1" 200 0
2024-05-31T17:58:39,322 [INFO ] epollEventLoopGroup-3-12 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:73074e851dd3,timestamp:1717178319
2024-05-31T17:58:45,339 [INFO ] W-9000-vicuna-13b-16k-stdout MODEL_LOG - Model prediction succeeded.
2024-05-31T17:58:45,340 [INFO ] W-9000-vicuna-13b-16k-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]PredictionTime.Milliseconds:397567.15|#ModelName:megagpt-1dot6,Level:Model|#type:GAUGE|#hostname:73074e851dd3,1717178325,3b84f7cc-0286-49f9-a7dc-854efd7c63d7, pattern=[METRICS]
2024-05-31T17:58:45,340 [INFO ] W-9000-vicuna-13b-16k-stdout MODEL_METRICS - PredictionTime.ms:397567.15|#ModelName:megagpt-1dot6,Level:Model|#hostname:73074e851dd3,requestID:3b84f7cc-0286-49f9-a7dc-854efd7c63d7,timestamp:1717178325
2024-05-31T17:58:45,340 [WARN ] W-9000-vicuna-13b-16k org.pytorch.serve.job.GRPCJob - grpc client call already cancelled, not able to send this response for requestId: 3b84f7cc-0286-49f9-a7dc-854efd7c63d7
2024-05-31T17:58:45,340 [ERROR] W-9000-vicuna-13b-16k org.pytorch.serve.wlm.WorkerThread - IllegalStateException error
java.lang.IllegalStateException: Stream was terminated by error, no further calls are allowed
	at com.google.common.base.Preconditions.checkState(Preconditions.java:502) ~[model-server.jar:?]
	at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:374) ~[model-server.jar:?]
	at org.pytorch.serve.job.GRPCJob.response(GRPCJob.java:130) ~[model-server.jar:?]
	at org.pytorch.serve.wlm.BatchAggregator.sendResponse(BatchAggregator.java:103) ~[model-server.jar:?]
	at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:238) [model-server.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
2024-05-31T17:58:45,353 [WARN ] W-9000-vicuna-13b-16k org.pytorch.serve.job.GRPCJob - grpc client call already cancelled, not able to send this response for requestId: 3b84f7cc-0286-49f9-a7dc-854efd7c63d7
2024-05-31T17:58:45,354 [INFO ] W-9000-vicuna-13b-16k ACCESS_LOG - /192.168.21.103:50831 "gRPC org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions HTTP/2.0" 13 397582
2024-05-31T17:58:45,354 [INFO ] W-9000-vicuna-13b-16k-stdout MODEL_LOG - Frontend disconnected.

Installation instructions

I use Docker.

Model Packaging

The model is packaged using an model archiver.

config.properties

No response

Versions

torch==2.3.0+cu121; sys_platform == 'linux'
torchvision==0.18.0+cu121; sys_platform == 'linux'
torchtext==0.18.0; sys_platform == 'linux'
torchaudio==2.3.0+cu121; sys_platform == 'linux'
torchserve==0.10.0
torch-model-archiver

Repro instructions

torchserve --start

Possible Solution

No response

@agunapal agunapal added the grpc label Jun 3, 2024
@thakursc1
Copy link

I am also facing the same issue

@thakursc1
Copy link

@agunapal Is there any workaround for this issue as of now ?

@agunapal
Copy link
Collaborator

Hi @thakursc1 I have brought this up internally. Will get back to you

@thakursc1
Copy link

@agunapal would downgrading to older version help, I see previously there was a effort to fix this : #2420

But error seems slightly different. What do you recommend ?

@agunapal
Copy link
Collaborator

I don't think downgrading would help. Did you try with http?

@thakursc1
Copy link

No, our services are mostly grpc.

@thakursc1
Copy link

Downgrading torchserve=0.9.0 helped solve this issue.

@agunapal
Copy link
Collaborator

I am able to repro the problem

2024-07-26T22:02:04,796 [ERROR] W-9000-llama3-8b-instruct_1.0 org.pytorch.serve.wlm.WorkerThread - IllegalStateException error
java.lang.IllegalStateException: Stream was terminated by error, no further calls are allowed
	at com.google.common.base.Preconditions.checkState(Preconditions.java:502) ~[model-server.jar:?]
	at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:374) ~[model-server.jar:?]
	at org.pytorch.serve.job.GRPCJob.response(GRPCJob.java:130) ~[model-server.jar:?]
	at org.pytorch.serve.wlm.BatchAggregator.sendResponse(BatchAggregator.java:102) ~[model-server.jar:?]
	at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:238) [model-server.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:840) [?:?]
2024-07-26T22:02:04,816 [WARN ] W-9000-llama3-8b-instruct_1.0 org.pytorch.serve.job.GRPCJob - grpc client call already cancelled, not able to send this response for requestId: 0077a105-8291-4298-9f04-2be6bd151ca1
2024-07-26T22:02:04,816 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_MODEL_LOADED
2024-07-26T22:02:04,817 [INFO ] W-9000-llama3-8b-instruct_1.0-stdout MODEL_LOG - Frontend disconnected.
2024-07-26T22:02:04,817 [INFO ] W-9000-llama3-8b-instruct_1.0 ACCESS_LOG - /127.0.0.1:48354 "gRPC org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions HTTP/2.0" 13 252726

@yurkoff-mv
Copy link
Author

@agunapal, please tell me from which version of TorchServe the problem is reproduced?

@agunapal
Copy link
Collaborator

Hi @yurkoff-mv if you try the nighties, I believe the issue is resolved. Please let me know

@yurkoff-mv
Copy link
Author

@agunapal, thank you very much for the information.
Question for the developers: how can I find out which final version TorchServe these updates will be included in?

@agunapal
Copy link
Collaborator

The next release of TorchServe will have the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants