Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Stream API deployment #573

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Add Stream API deployment #573

wants to merge 1 commit into from

Conversation

Vinlic
Copy link
Contributor

@Vinlic Vinlic commented Apr 13, 2023

This script implements the streaming transmission of model response results, eliminating the need for users to wait for a complete response of the content.
When accessing the interface, it will return an 'event-stream' stream.

@Vinlic
Copy link
Contributor Author

Vinlic commented Apr 14, 2023

This PR uses the same SSE transmission as ChatGPT, which is a way for servers to push data to clients and has higher performance advantages compared to synchronous response and WebSocket solutions.
demo

@ninghongbo123
Copy link

@Vinlic 感谢你提供的stream api方案。我这边的测试发现,server端报这个错(见下边server端报错情况),但不影响程序的运行。client 端用requests是报这个错:requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))。不知道你这边有遇到过吗?
server端报错:RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1

@yhyu13
Copy link

yhyu13 commented Apr 23, 2023

@ALL From GPT-4

这个错误表明在尝试使用不可用的CUDA设备。RuntimeError: CUDA error: invalid device ordinal 是因为尝试使用无法访问的GPU设备。可能的原因是设备ID设置错误或系统上没有足够的GPU设备。你可以通过检查DEVICE_ID的值来解决这个问题,确保它指向一个可用的GPU设备。

首先,请检查您的系统上可用的GPU设备数量。在命令行中运行以下命令:

nvidia-smi

这将显示您的系统上的GPU设备以及其他相关信息。请确保您选择的设备ID(在脚本中为DEVICE_ID)在可用设备范围内。

如果您只有一个GPU设备,将DEVICE_ID设置为"0",如下所示:

DEVICE_ID = "0"

对于客户端requests.exceptions.ChunkedEncodingError错误,这是因为requests库不支持处理服务器发送事件(SSE)响应。您需要使用另一个库,如httpxaiohttp,它们支持异步请求和处理SSE响应。

例如,您可以使用httpx库。首先,安装httpx

pip install httpx

然后,您可以使用以下代码来接收服务器发送的事件:

import httpx

url = "http://127.0.0.1:8010"
data = {
    "input": "你好ChatGLM",
    "max_length": 2048,
    "top_p": 0.7,
    "temperature": 0.95,
    "history": [],
    "html_entities": True,
}

async with httpx.AsyncClient() as client:
    async with client.stream("POST", url, json=data) as response:
        async for line in response.aiter_lines():
            print(line)

这应该解决您在客户端遇到的问题。

@liseri
Copy link

liseri commented Apr 25, 2023

我刚提了个PR,#808 , 比我实现的好,不过你的代码里没有把history返回; 重新整理一下呗,完善一下,我看SSE按用你的代码挺好;

@sportzhang
Copy link

你好,有请求的示例吗?你这个请求页面可以发一下吗?感谢!!!

@llxxxll
Copy link

llxxxll commented May 21, 2023

你好,有请求的示例吗?你这个请求页面可以发一下吗?感谢!!!

@Vinlic 的工作很棒
通过使用经验,我试着补充这一部分的示例和使用说明。

API部署
首先需要安装额外的依赖 pip install sse_starlette,然后运行仓库中的 stream_api.py:

python stream_api.py

默认部署在本地的 8010 端口,通过 POST 方法进行调用

curl -X POST "http://127.0.0.1:8010" \
     -H 'Content-Type: application/json' \
     -d '{"input": "你好"}'

得到的返回值为

stream context

@Shawn4742
Copy link

请问如果用requests.post()的形式,该如何请求呢?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants