The service crashes if the model takes a long time to respond #3168

yurkoff-mv · 2024-05-31T18:26:38Z

🐛 Describe the bug

TorchServe version is 0.10.0.
It's my code:

def get_inference_stub(address: str, port: Union[str, int]= 7070):
    channel = grpc.insecure_channel(address + ':' + str(port),
                                    options=[
                                        ('grpc.max_send_message_length', 1073741824),
                                        ('grpc.max_receive_message_length', 1073741824),
                                        ('grpc.initial_reconnect_backoff_ms', 30000)
                                    ],
                                    )
    stub = inference_pb2_grpc.InferenceAPIsServiceStub(channel)
    return stub

def infer(stub, data, model_name: str):
    data = base64.b64encode(pickle.dumps(data, protocol=3))
    input_data = {'data': data}
    try:
        response = stub.Predictions(inference_pb2.PredictionsRequest(model_name=model_name, input=input_data), timeout=330)
        prediction = pickle.loads(base64.b64decode(response.prediction))
        # prediction = pickle.loads(base64.b64decode(bytes(response.prediction, 'utf-8')))
        return prediction
    except grpc.RpcError as e:
        exit(1)

Timeouts on the client are set quite large.

My config:

inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
grpc_inference_port=7070
grpc_management_port=7071
max_request_size=1073741824
max_response_size=1073741824
install_py_dep_per_model=true
number_of_gpu=1
number_of_netty_threads=32
job_queue_size=1
async_logging=true
model_store=/home/model-server/model-store
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"vicuna-13b-16k":{"1.6":{"defaultVersion":true,"marName":"vicuna-13b-16k.mar","minWorkers":1,"maxWorkers":1,"batchSize":1,"maxBatchDelay":300000,"responseTimeout":300000}}}}

Error logs

2024-05-31T17:58:39,322 [INFO ] epollEventLoopGroup-3-12 ACCESS_LOG - /192.168.21.103:50546 "GET /models HTTP/1.1" 200 0
2024-05-31T17:58:39,322 [INFO ] epollEventLoopGroup-3-12 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:73074e851dd3,timestamp:1717178319
2024-05-31T17:58:45,339 [INFO ] W-9000-vicuna-13b-16k-stdout MODEL_LOG - Model prediction succeeded.
2024-05-31T17:58:45,340 [INFO ] W-9000-vicuna-13b-16k-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]PredictionTime.Milliseconds:397567.15|#ModelName:megagpt-1dot6,Level:Model|#type:GAUGE|#hostname:73074e851dd3,1717178325,3b84f7cc-0286-49f9-a7dc-854efd7c63d7, pattern=[METRICS]
2024-05-31T17:58:45,340 [INFO ] W-9000-vicuna-13b-16k-stdout MODEL_METRICS - PredictionTime.ms:397567.15|#ModelName:megagpt-1dot6,Level:Model|#hostname:73074e851dd3,requestID:3b84f7cc-0286-49f9-a7dc-854efd7c63d7,timestamp:1717178325
2024-05-31T17:58:45,340 [WARN ] W-9000-vicuna-13b-16k org.pytorch.serve.job.GRPCJob - grpc client call already cancelled, not able to send this response for requestId: 3b84f7cc-0286-49f9-a7dc-854efd7c63d7
2024-05-31T17:58:45,340 [ERROR] W-9000-vicuna-13b-16k org.pytorch.serve.wlm.WorkerThread - IllegalStateException error
java.lang.IllegalStateException: Stream was terminated by error, no further calls are allowed
	at com.google.common.base.Preconditions.checkState(Preconditions.java:502) ~[model-server.jar:?]
	at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:374) ~[model-server.jar:?]
	at org.pytorch.serve.job.GRPCJob.response(GRPCJob.java:130) ~[model-server.jar:?]
	at org.pytorch.serve.wlm.BatchAggregator.sendResponse(BatchAggregator.java:103) ~[model-server.jar:?]
	at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:238) [model-server.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
2024-05-31T17:58:45,353 [WARN ] W-9000-vicuna-13b-16k org.pytorch.serve.job.GRPCJob - grpc client call already cancelled, not able to send this response for requestId: 3b84f7cc-0286-49f9-a7dc-854efd7c63d7
2024-05-31T17:58:45,354 [INFO ] W-9000-vicuna-13b-16k ACCESS_LOG - /192.168.21.103:50831 "gRPC org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions HTTP/2.0" 13 397582
2024-05-31T17:58:45,354 [INFO ] W-9000-vicuna-13b-16k-stdout MODEL_LOG - Frontend disconnected.

Installation instructions

I use Docker.

Model Packaging

The model is packaged using an model archiver.

config.properties

No response

Versions

torch==2.3.0+cu121; sys_platform == 'linux'
torchvision==0.18.0+cu121; sys_platform == 'linux'
torchtext==0.18.0; sys_platform == 'linux'
torchaudio==2.3.0+cu121; sys_platform == 'linux'
torchserve==0.10.0
torch-model-archiver

Repro instructions

torchserve --start

Possible Solution

No response

The text was updated successfully, but these errors were encountered:

thakursc1 · 2024-06-20T18:51:30Z

I am also facing the same issue

thakursc1 · 2024-06-21T17:06:57Z

@agunapal Is there any workaround for this issue as of now ?

agunapal · 2024-06-21T17:10:38Z

Hi @thakursc1 I have brought this up internally. Will get back to you

thakursc1 · 2024-06-21T17:17:56Z

@agunapal would downgrading to older version help, I see previously there was a effort to fix this : #2420

But error seems slightly different. What do you recommend ?

agunapal · 2024-06-21T17:20:54Z

I don't think downgrading would help. Did you try with http?

thakursc1 · 2024-06-21T18:20:35Z

No, our services are mostly grpc.

thakursc1 · 2024-07-26T18:21:55Z

Downgrading torchserve=0.9.0 helped solve this issue.

agunapal · 2024-07-26T22:03:25Z

I am able to repro the problem

2024-07-26T22:02:04,796 [ERROR] W-9000-llama3-8b-instruct_1.0 org.pytorch.serve.wlm.WorkerThread - IllegalStateException error
java.lang.IllegalStateException: Stream was terminated by error, no further calls are allowed
	at com.google.common.base.Preconditions.checkState(Preconditions.java:502) ~[model-server.jar:?]
	at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:374) ~[model-server.jar:?]
	at org.pytorch.serve.job.GRPCJob.response(GRPCJob.java:130) ~[model-server.jar:?]
	at org.pytorch.serve.wlm.BatchAggregator.sendResponse(BatchAggregator.java:102) ~[model-server.jar:?]
	at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:238) [model-server.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:840) [?:?]
2024-07-26T22:02:04,816 [WARN ] W-9000-llama3-8b-instruct_1.0 org.pytorch.serve.job.GRPCJob - grpc client call already cancelled, not able to send this response for requestId: 0077a105-8291-4298-9f04-2be6bd151ca1
2024-07-26T22:02:04,816 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_MODEL_LOADED
2024-07-26T22:02:04,817 [INFO ] W-9000-llama3-8b-instruct_1.0-stdout MODEL_LOG - Frontend disconnected.
2024-07-26T22:02:04,817 [INFO ] W-9000-llama3-8b-instruct_1.0 ACCESS_LOG - /127.0.0.1:48354 "gRPC org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions HTTP/2.0" 13 252726

yurkoff-mv · 2024-08-15T05:42:41Z

@agunapal, please tell me from which version of TorchServe the problem is reproduced?

agunapal · 2024-08-15T05:44:33Z

Hi @yurkoff-mv if you try the nighties, I believe the issue is resolved. Please let me know

yurkoff-mv · 2024-08-15T05:52:21Z

@agunapal, thank you very much for the information.
Question for the developers: how can I find out which final version TorchServe these updates will be included in?

agunapal · 2024-08-15T23:23:47Z

The next release of TorchServe will have the fix

agunapal added the grpc label Jun 3, 2024

thakursc1 mentioned this issue Jun 21, 2024

TorchServe crashes in production with `WorkerThread - IllegalStateException error' #3087

Closed

slashvar mentioned this issue Jul 25, 2024

Leave response and sendError when request is canceled #3267

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The service crashes if the model takes a long time to respond #3168

The service crashes if the model takes a long time to respond #3168

yurkoff-mv commented May 31, 2024

thakursc1 commented Jun 20, 2024

thakursc1 commented Jun 21, 2024

agunapal commented Jun 21, 2024

thakursc1 commented Jun 21, 2024

agunapal commented Jun 21, 2024

thakursc1 commented Jun 21, 2024

thakursc1 commented Jul 26, 2024

agunapal commented Jul 26, 2024

yurkoff-mv commented Aug 15, 2024

agunapal commented Aug 15, 2024

yurkoff-mv commented Aug 15, 2024

agunapal commented Aug 15, 2024

The service crashes if the model takes a long time to respond #3168

The service crashes if the model takes a long time to respond #3168

Comments

yurkoff-mv commented May 31, 2024

🐛 Describe the bug

Error logs

Installation instructions

Model Packaging

config.properties

Versions

Repro instructions

Possible Solution

thakursc1 commented Jun 20, 2024

thakursc1 commented Jun 21, 2024

agunapal commented Jun 21, 2024

thakursc1 commented Jun 21, 2024

agunapal commented Jun 21, 2024

thakursc1 commented Jun 21, 2024

thakursc1 commented Jul 26, 2024

agunapal commented Jul 26, 2024

yurkoff-mv commented Aug 15, 2024

agunapal commented Aug 15, 2024

yurkoff-mv commented Aug 15, 2024

agunapal commented Aug 15, 2024