-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can not test with restful_api #308
Comments
The same issue, do not know where is wrong. |
Hi @irasin it looks like you are using an older version of MII. Your error message for line 31 of Can you please update to the latest source build of DeepSpeed and DeepSpeed-MII? pip uninstall deepspeed deepspeed-mii -y
pip install git+https://github.com/microsoft/deepspeed.git
pip install git+https://github.com/microsoft/deepspeed-mii.git |
@mrwyattii yes, this solved! But when I do requests, another issue cames. Would you help to check this? |
Hi, @mrwyattii, many thanks for your reply. After using the latest source build of DeepSpeed and DeepSpeed-MII, now it works with restful api now.
import mii
model_name_or_path = /dataset/huggyllama/llama-7b"
max_model_length = 2048
mii.serve(
model_name_or_path=model_name_or_path,
max_length=max_model_length,
deployment_name="mii_test",
tensor_parallel=1,
replica_num=1,
enable_restful_api=True,
restful_api_port=8000,
)
import json
import requests
url = f"http://localhost:8000/mii/mii_test"
params = {"prompts": ["DeepSpeed is", "Seattle is a place"], "max_length": 128}
json_params = json.dumps(params)
output = requests.post(
url, data=json_params, headers={"Content-Type": "application/json"}
)
text = output.text
print(text)
json_res = json.loads(text)
assert isinstance(json_res, str) ## it's still a string because some escape characters?
print(json_res) the result is as below, "{\n \"response\": [\n \"the solution for low speed, high-current IGBT switching applications that involve controlling high power from a series of IGBT modules, such as output inverters for PV, wind, motor drives, UPS, or Xenon lighting applications.\\nThe platform provides an open and modular solution for achieving fast switching times, meeting the rapid rise in demand for higher power modules. This is enabled through the modular design of the DeepSpeed core, which offers high-speed operation, reducing the number of components and improving size and cost.\\nDeepSpeed is fully compli\",\n \"I've had the pleasure of knowing, through the virtual ether, for over 15 years. I have also been fortunate enough to visit Seattle on several occasions over the years as well as being able to collaborate and visit artists' studios in the Northwest. When opportunity knocked and the folks at Art Informel extended an invitation to show at their space, I felt the stars were aligned, that this was meant to be. I hope you'll join me in Seattle for the opening this Saturday, December 10th, from 5-9PM, at\"\n ]\n}"
{
"response": [
"the solution for low speed, high-current IGBT switching applications that involve controlling high power from a series of IGBT modules, such as output inverters for PV, wind, motor drives, UPS, or Xenon lighting applications.\nThe platform provides an open and modular solution for achieving fast switching times, meeting the rapid rise in demand for higher power modules. This is enabled through the modular design of the DeepSpeed core, which offers high-speed operation, reducing the number of components and improving size and cost.\nDeepSpeed is fully compli",
"I've had the pleasure of knowing, through the virtual ether, for over 15 years. I have also been fortunate enough to visit Seattle on several occasions over the years as well as being able to collaborate and visit artists' studios in the Northwest. When opportunity knocked and the folks at Art Informel extended an invitation to show at their space, I felt the stars were aligned, that this was meant to be. I hope you'll join me in Seattle for the opening this Saturday, December 10th, from 5-9PM, at"
]
} Hope to get answer again. |
@ChristineSeven can you share the full script that you are using to deploy MII? Specifically, I would like to know what model, tensor parallel settings, etc. |
@irasin can you please try the following instead? import json
import requests
url = f"http://localhost:8000/mii/mii_test"
params = {"prompts": ["DeepSpeed is", "Seattle is a place"], "max_length": 128}
json_params = json.dumps(params)
output = requests.post(
url, data=json_params, headers={"Content-Type": "application/json"}
)
print(output.json()) |
Hi, @mrwyattii , the results are the same. |
i also face the same question, internel error please help us |
@irasin Is your Flask version <3.0.0? If so, I think I have the solution in #328. Can you try with that PR? You can install it with |
@cableyang can you please share the full script that you are running so that I can try to reproduce the error? Thanks |
With the latest DeepSpeed-MII commit, I can get the json format output now. Thanks a lot, @mrwyattii BTW, I wonder where can I get the benchmark scripts you used in the Performance Evaluation of https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-fastgen/README.md. I test with benchmark_server.py script in vllm repo, which sends 1000 request to the server in the same time, and I keep getting SYN flood error messages in demsg output like [1021332.329430] TCP: request_sock_TCP: Possible SYN flooding on port 8000. Dropping request. Check SNMP counters. I'm curious if there is any limit on the maximum number of connections on the server side or restful_api |
sorry for late reply. import argparse
import asyncio
import json
import random
import time
from typing import AsyncGenerator, List, Tuple, Union
import aiohttp
import numpy as np
import codecs
from time import sleep
global token_num
token_num=0
def sample_requests() -> List[Tuple[str, dict]]:
# Load the dataset.
content_list = []
num_all=0
with open("457.json","r",encoding='utf-8') as f:
lines = f.readlines()
print(len(lines))
for line in lines:
if line:
data = json.loads(line)
content_list.append(data)
print(num_all)
print(len(content_list))
print(content_list[0])
print("read data set finish")
prompts = [content['question'] for content in content_list]
tokenized_dataset = []
for i in range(len(content_list)):
tokenized_dataset.append((prompts[i], content_list[i]))
return tokenized_dataset
async def send_request(
prompt: str,
origin_json: dict
) -> None:
global token_num
request_start_time = time.time()
headers = {'Content-Type': 'application/json'}
headers = {"User-Agent": "Benchmark Client"}
url = "http://10.10.10.10:28093/mii/mistral-deployment"
output_list = []
params = {"prompts": [prompt], "max_length": 4096}
json_params = json.dumps(params)
timeout = aiohttp.ClientTimeout(total=3 * 3600)
async with aiohttp.ClientSession(timeout=timeout) as session:
while True:
async with session.post(url, headers=headers, data=json_params) as response:
chunks = []
async for chunk, _ in response.content.iter_chunks():
chunks.append(chunk)
output = b"".join(chunks).decode("utf-8")
print(output)
try:
result=json.loads(output.json())
origin_json['model_answer'] = result['response'][0]
except:
origin_json['model_answer'] = ''
token_num+=1
print(token_num)
if "error" not in output:
break
return origin_json
async def batchmark(
input_requests: List[Tuple[str, dict]],
) -> None:
tasks: List[asyncio.Task] = []
async for request in get_request(input_requests):
prompt, origin_json = request
task = asyncio.create_task(send_request(prompt,
origin_json))
tasks.append(task)
results=await asyncio.gather(*tasks)
return results
def main(args: argparse.Namespace):
print(args)
random.seed(args.seed)
np.random.seed(args.seed)
input_requests = sample_requests()
batch_start_time = time.time()
for i in range(0, len(input_requests), 50):
total_results=asyncio.run(batchmark(input_requests[i:i+50]))
with open('457_deepspeed_out.json', 'a+', encoding='utf-8') as f1:
for origin_json in total_results:
json_data = json.dumps(origin_json, ensure_ascii=False)
f1.write(json_data + "\n")
f1.flush()
batch_end_time = time.time()
print(batch_end_time-batch_start_time)
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Batchmark the online serving throughput.")
parser.add_argument("--seed", type=int, default=0)
args = parser.parse_args()
main(args) |
the server code is like this: client = mii.serve(
"mistralai/Mistral-7B-v0.1",
deployment_name="mistral-deployment",
enable_restful_api=True,
restful_api_port=28080,
) |
The benchmarks we ran to collect data for our FastGen blog post can be found here: https://github.com/microsoft/DeepSpeedExamples/tree/master/benchmarks/inference/mii Note that we did not use the RESTful API in our benchmarks and instead use the python API (i.e., |
Great job!
But I have some problems with restful_apt test, hope to get some help here.
test with commit : ddbc6fc
gpu: NVidia A10
launch service
test with curl
And I got error
And the python script is the same result
just wonder if I'm missing any hyper parameters setting?
The text was updated successfully, but these errors were encountered: