API server支持更好的并发 #1018

is · 2023-05-14T15:38:59Z

通过ThreadPoolExecutor异步执行model.chat
保证多路同时请求能并发响应.

is · 2023-05-15T09:55:15Z

参考 #808 使用pydantic定义请求/相应结构.

hellocxj · 2023-05-27T13:58:55Z

WEB API 使用pydantic定义输入和输出结构后，可以支持chat_stream模式吗？
另外，如果是单卡，在使用CHAT_STREAM模式的时候，发现同时多人询问，问题会串到不同人的问题中，如何解决。
还有多卡的话，在不同的端口号上启动多个进程响应询问服务，外面通过NGINX统一一个端口上进行转发，是不是也会遇到上面，问答会串的问题？

is · 2023-05-27T14:57:33Z

这个PR并不能实现真正意义上的单卡并发，多请求排队可能都有问题，所以直接关了。

CHAT_STREAM的话，需要换一下thread pool的执行方式，单个session的流反馈用同步方式来执行。

多卡反而比较好搞，FASTAPI本身是支持多worker模式的，根据卡的数量开worker，注意一下worker初始化阶段通过CUDA_VISABLE_DEVICE或者类似的方式控制一下就可以了。

is · 2023-05-27T14:57:55Z

def model_stream_predict(input_text, max_length, top_p, temperature, history=[]):
    global model, tokenizer
    next_response_index = 0
    for response, history in model.stream_chat(tokenizer, input_text, history=history, max_length=max_length,
                                               top_p=top_p,
                                               temperature=temperature):
            if len(response) > next_response_index:
                yield response[next_response_index:]
                next_response_index = len(response)
    # print(history)
    torch_gc()

@app.post("/stream_chat")
async def create_stream_chat(params: Params) -> StreamingResponse:
    model_response = model_stream_predict(
        params.prompt,
        history=params.history,
        max_length=params.max_length,
        top_p=params.top_p,
        temperature=params.temperature
    )
    return StreamingResponse(
        content = model_response,
        status_code=status.HTTP_200_OK,
        media_type="text/html",
    )

hellocxj · 2023-05-27T15:13:09Z

“需要换一下thread pool的执行方式，单个session的流反馈用同步方式来执行”这个实例代码是不是可以参考这个“通过ThreadPoolExecutor异步执行model.chat
保证多路同时请求能并发响应.” 这个代码实现？

hellocxj · 2023-05-27T15:16:59Z

还有一个问题请教，FASTAPI本身是支**持多worker模式的，根据卡的数量开worker，注意一下worker初始化阶段通过CUDA_VISABLE_DEVICE或者类似的方式控制一下就可以了。**这个有示例吗？怎么设置比较合理？

hellocxj · 2023-05-27T15:17:42Z

还有一个问题请教，FASTAPI本身是支**持多worker模式的，根据卡的数量开worker，注意一下worker初始化阶段通过CUDA_VISABLE_DEVICE或者类似的方式控制一下就可以了。**这个有示例吗？怎么设置比较合理？

hellocxj · 2023-05-27T15:17:49Z

还有一个问题请教，FASTAPI本身是支**持多worker模式的，根据卡的数量开worker，注意一下worker初始化阶段通过CUDA_VISABLE_DEVICE或者类似的方式控制一下就可以了。**这个有示例吗？怎么设置比较合理？

is · 2023-05-27T15:43:24Z

看了一下fastapi/uvicorn的文档，因为没有简单的方法传递workerid信息，就不能方便的去绑定对应的GPU，看起来也不是很方便。还是前置一个HTTP Server 做负载均衡，根据对应GPU个数，启动worker来的更直接一些吧。

…

On Sat, May 27, 2023 at 11:18 PM hellocxj ***@***.***> wrote: 还有一个问题请教，FASTAPI本身是支**持多worker模式的，根据卡的数量开worker，注意一下worker初始化阶段通过CUDA_VISABLE_DEVICE或者类似的方式控制一下就可以了。**这个有示例吗？怎么设置比较合理？ — Reply to this email directly, view it on GitHub <#1018 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAAY5FYUMCI4RIH7KW22PDXIILKRANCNFSM6AAAAAAYBHMFDA> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

whitesay · 2023-08-27T02:48:21Z

“需要换一下thread pool的执行方式，单个session的流反馈用同步方式来执行”这个实例代码是不是可以参考这个“通过ThreadPoolExecutor异步执行model.chat 保证多路同时请求能并发响应.” 这个代码实现？

请问一下这个问题有解决方案吗？我也遇到了类似的问题，在sse流式输出的时候同步执行时，问题会串到不同人的问题中，从而出现ASGI application的报错，请问一下能否实现stream下不相互干扰的多并发？谢谢！

whitesay · 2023-08-27T02:53:23Z

另外，如果是单卡，在使用CHAT_STREAM模式的时候，发现同时多人询问，问题会串到不同人的问题中，如何解决。

请问一下您解决这个问题了吗？感谢！

is force-pushed the api-2 branch from 7e335cc to b709499 Compare May 15, 2023 05:59

is added 2 commits May 15, 2023 21:48

api.py支持并发服务.

9cc1bd5

使用pydantic定义输入和输出结构.

faf8353

is force-pushed the api-2 branch from b709499 to faf8353 Compare May 15, 2023 13:48

is closed this May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API server支持更好的并发 #1018

API server支持更好的并发 #1018

is commented May 14, 2023

is commented May 15, 2023

hellocxj commented May 27, 2023

is commented May 27, 2023

is commented May 27, 2023

hellocxj commented May 27, 2023

hellocxj commented May 27, 2023

hellocxj commented May 27, 2023

hellocxj commented May 27, 2023

is commented May 27, 2023 via email

whitesay commented Aug 27, 2023

whitesay commented Aug 27, 2023

API server支持更好的并发 #1018

API server支持更好的并发 #1018

Conversation

is commented May 14, 2023

is commented May 15, 2023

hellocxj commented May 27, 2023

is commented May 27, 2023

is commented May 27, 2023

hellocxj commented May 27, 2023

hellocxj commented May 27, 2023

hellocxj commented May 27, 2023

hellocxj commented May 27, 2023

is commented May 27, 2023 via email

whitesay commented Aug 27, 2023

whitesay commented Aug 27, 2023