Add Stream API deployment #573

Vinlic · 2023-04-13T04:00:41Z

This script implements the streaming transmission of model response results, eliminating the need for users to wait for a complete response of the content.
When accessing the interface, it will return an 'event-stream' stream.

Vinlic · 2023-04-14T01:36:33Z

This PR uses the same SSE transmission as ChatGPT, which is a way for servers to push data to clients and has higher performance advantages compared to synchronous response and WebSocket solutions.

ninghongbo123 · 2023-04-19T03:43:39Z

@Vinlic 感谢你提供的stream api方案。我这边的测试发现，server端报这个错（见下边server端报错情况），但不影响程序的运行。client 端用requests是报这个错：requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))。不知道你这边有遇到过吗？
server端报错：RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1

yhyu13 · 2023-04-23T10:15:08Z

@ALL From GPT-4

这个错误表明在尝试使用不可用的CUDA设备。RuntimeError: CUDA error: invalid device ordinal 是因为尝试使用无法访问的GPU设备。可能的原因是设备ID设置错误或系统上没有足够的GPU设备。你可以通过检查DEVICE_ID的值来解决这个问题，确保它指向一个可用的GPU设备。

首先，请检查您的系统上可用的GPU设备数量。在命令行中运行以下命令：

nvidia-smi

这将显示您的系统上的GPU设备以及其他相关信息。请确保您选择的设备ID（在脚本中为DEVICE_ID）在可用设备范围内。

如果您只有一个GPU设备，将DEVICE_ID设置为"0"，如下所示：

DEVICE_ID = "0"

对于客户端requests.exceptions.ChunkedEncodingError错误，这是因为requests库不支持处理服务器发送事件（SSE）响应。您需要使用另一个库，如httpx或aiohttp，它们支持异步请求和处理SSE响应。

例如，您可以使用httpx库。首先，安装httpx：

pip install httpx

然后，您可以使用以下代码来接收服务器发送的事件：

import httpx

url = "http://127.0.0.1:8010"
data = {
    "input": "你好ChatGLM",
    "max_length": 2048,
    "top_p": 0.7,
    "temperature": 0.95,
    "history": [],
    "html_entities": True,
}

async with httpx.AsyncClient() as client:
    async with client.stream("POST", url, json=data) as response:
        async for line in response.aiter_lines():
            print(line)

这应该解决您在客户端遇到的问题。

liseri · 2023-04-25T06:10:12Z

我刚提了个PR，#808 ，比我实现的好，不过你的代码里没有把history返回；重新整理一下呗，完善一下，我看SSE按用你的代码挺好；

sportzhang · 2023-04-26T08:56:07Z

你好，有请求的示例吗？你这个请求页面可以发一下吗？感谢！！！

llxxxll · 2023-05-21T15:54:04Z

你好，有请求的示例吗？你这个请求页面可以发一下吗？感谢！！！

@Vinlic 的工作很棒
通过使用经验，我试着补充这一部分的示例和使用说明。

API部署
首先需要安装额外的依赖 pip install sse_starlette，然后运行仓库中的 stream_api.py：

python stream_api.py

默认部署在本地的 8010 端口，通过 POST 方法进行调用

curl -X POST "http://127.0.0.1:8010" \
     -H 'Content-Type: application/json' \
     -d '{"input": "你好"}'

得到的返回值为

stream context

Shawn4742 · 2023-08-08T22:11:20Z

请问如果用requests.post()的形式，该如何请求呢？

Add Stream API deployment

0294705

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Stream API deployment #573

Add Stream API deployment #573

Vinlic commented Apr 13, 2023

Vinlic commented Apr 14, 2023

ninghongbo123 commented Apr 19, 2023

yhyu13 commented Apr 23, 2023

liseri commented Apr 25, 2023

sportzhang commented Apr 26, 2023

llxxxll commented May 21, 2023

Shawn4742 commented Aug 8, 2023

Add Stream API deployment #573

Are you sure you want to change the base?

Add Stream API deployment #573

Conversation

Vinlic commented Apr 13, 2023

Vinlic commented Apr 14, 2023

ninghongbo123 commented Apr 19, 2023

yhyu13 commented Apr 23, 2023

liseri commented Apr 25, 2023

sportzhang commented Apr 26, 2023

llxxxll commented May 21, 2023

Shawn4742 commented Aug 8, 2023