Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for OpenAI API : offline batch(file) processing #699

Merged
merged 11 commits into from
Jul 29, 2024

Conversation

yichuan520030910320
Copy link
Collaborator

@yichuan520030910320 yichuan520030910320 commented Jul 22, 2024

Thank you for your contribution, we really appreciate it. The following instructions will help improve your pull request and make it easier to receive feedback. If there are any items you don't understand, don't worry. Just submit the pull request and ask the maintainers for help.

Motivation

I want to support this OpenAI batch API functionally OpenAI Batch API Doc .

Running example

In this PR, we can support SGlang backend running behind the example code provided by OpenAI in the provided link seamlessly.

The frontend code is

from openai import OpenAI
import openai
import time
import json
import os


class OpenAIBatchProcessor:
    def __init__(self, api_key):
        # client = OpenAI(api_key=api_key)
        client = openai.Client(base_url="http://127.0.0.1:30000/v1", api_key="EMPTY")

        self.client = client

    def process_batch(self, input_file_path, endpoint, completion_window):

        # Upload the input file
        with open(input_file_path, "rb") as file:
            uploaded_file = self.client.files.create(file=file, purpose="batch")

        # Create the batch job
        batch_job = self.client.batches.create(
            input_file_id=uploaded_file.id,
            endpoint=endpoint,
            completion_window=completion_window,
        )

        # Monitor the batch job status
        while batch_job.status not in ["completed", "failed", "cancelled"]:
            time.sleep(3)  # Wait for 3 seconds before checking the status again
            print(
                f"Batch job status: {batch_job.status}...trying again in 3 seconds..."
            )
            batch_job = self.client.batches.retrieve(batch_job.id)

        # Check the batch job status and errors
        if batch_job.status == "failed":
            print(f"Batch job failed with status: {batch_job.status}")
            print(f"Batch job errors: {batch_job.errors}")
            return None

        # If the batch job is completed, process the results
        if batch_job.status == "completed":

            # print result of batch job
            print("batch", batch_job.request_counts)

            result_file_id = batch_job.output_file_id
            # Retrieve the file content from the server
            file_response = self.client.files.content(result_file_id)
            result_content = file_response.read()  # Read the content of the file

            # Save the content to a local file
            result_file_name = "batch_job_chat_results.jsonl"
            with open(result_file_name, "wb") as file:
                file.write(result_content)  # Write the binary content to the file
            # Load data from the saved JSONL file
            results = []
            with open(result_file_name, "r", encoding="utf-8") as file:
                for line in file:
                    json_object = json.loads(
                        line.strip()
                    )  # Parse each line as a JSON object
                    results.append(json_object)

            return results
        else:
            print(f"Batch job failed with status: {batch_job.status}")
            return None


# Initialize the OpenAIBatchProcessor
api_key = os.environ.get("OPENAI_API_KEY")
processor = OpenAIBatchProcessor(api_key)

# Process the batch job
input_file_path = "input.jsonl"
endpoint = "/v1/chat/completions"
completion_window = "24h"

# Process the batch job
results = processor.process_batch(input_file_path, endpoint, completion_window)

# Print the results
print(results)

and the return result is

Batch job status: validating...trying again in 3 seconds...
Batch job status: in_progress...trying again in 3 seconds...
Batch job status: in_progress...trying again in 3 seconds...
batch BatchRequestCounts(completed=2, failed=0, total=2)
[{'id': 'batch_req_bee263f2-0734-44cb-96c4-7fb8c4ad1661', 'custom_id': 'request-1', 'response': {'status_code': 200, 'request_id': '2a93b0ea158c42faa4efb9578ad75064', 'body': {'id': '2a93b0ea158c42faa4efb9578ad75064', 'object': 'chat.completion', 'created': 1721670579, 'model': 'gpt-3.5-turbo-0125', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "  Hello there! 👋 It's my pleasure to assist you. Here are three NBA players:\n\n1. LeBron James - He is a four-time NBA champion and four-time NBA Most Valuable Player (MVP) who has played for the Cleveland Cavaliers, Miami Heat, and Los Angeles Lakers.\n2. Stephen Curry - He is a three-time NBA champion and two-time NBA MVP who has played for the Golden State Warriors. He is known for his incredible shooting ability and has won multiple awards for his skills.\n3. Kevin Durant - He is a two-time NBA champion and two-time NBA MVP who has played for the Oklahoma City Thunder and Golden State Warriors. He is known for his scoring ability and is considered one of the best players in the NBA.\n\nI hope this helps! Let me know if you have any other questions. 😊"}, 'logprobs': None, 'finish_reason': 'FINISH_MATCHED_TOKEN: 2'}], 'usage': {'prompt_tokens': 37, 'completion_tokens': 203, 'total_tokens': 240}, 'system_fingerprint': None}}, 'error': None}, {'id': 'batch_req_db9eff3b-aead-4641-bb7b-8fba9fc2626c', 'custom_id': 'request-2', 'response': {'status_code': 200, 'request_id': '3c4bf3e052184f7a8f3e37dc71c79496', 'body': {'id': '3c4bf3e052184f7a8f3e37dc71c79496', 'object': 'chat.completion', 'created': 1721670581, 'model': 'gpt-3.5-turbo-0125', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "  Hello there! As an assistant, I'm happy to help. Here are three capital cities:\n\n1. Tokyo, Japan\n2. New York City, USA\n3. London, United Kingdom"}, 'logprobs': None, 'finish_reason': 'FINISH_MATCHED_TOKEN: 2'}], 'usage': {'prompt_tokens': 34, 'completion_tokens': 44, 'total_tokens': 78}, 'system_fingerprint': None}}, 'error': None}]

Notes

  • Here we can also support text completion; this feature(Batch API) can work together with parallel sampling.

  • Beyond adding Batch API, we can further reorder/reschedule the request in the file to do further

The basic design now is that the output line order may not match the input line order. we can use the custom_id field which will be present in each line of the output file, and allow map requests in input to results in output.

  • Support managing loaded files in the server (eg, the implementation of @app.post("/v1/files") and @app.get("/v1/files/{file_id}") for creating and retrieving files in the server)

Modification

  • Add file and batch OpenAI API in python/sglang/srt/server.py
  • Implement the logic of creating/retrieving/querying the file/batch in python/sglang/srt/openai_api/adapter.py and manage the relationship between files and batch requests (main modification)
  • Add some data structure in python/sglang/srt/openai_api/protocol.py for convenience (all data structure/format following Bacth API reference and File API Reference )
  • Refractor the code in python/sglang/srt/openai_api/adapter.py for function reuse
  • Add new parameters in python/sglang/srt/server.py to determine the file to store batch serving results in the server

cc @merrymercy @Ying1123 @hnyls2002 for CR

Checklist

  1. Ensure pre-commit pre-commit run --all-files or other linting tools are used to fix potential lint issues.
  2. Confirm that modifications are covered by complete unit tests. If not, please add more unit tests for correctness.
  3. Modify documentation as needed, such as docstrings or example tutorials.

python/sglang/srt/openai_api/adapter.py Show resolved Hide resolved
python/sglang/srt/managers/tokenizer_manager.py Outdated Show resolved Hide resolved
python/sglang/srt/openai_api/adapter.py Outdated Show resolved Hide resolved
python/sglang/srt/openai_api/adapter.py Outdated Show resolved Hide resolved
@merrymercy merrymercy mentioned this pull request Jul 27, 2024
29 tasks
@yichuan520030910320
Copy link
Collaborator Author

Finish accuracy and throughput test, it is ready to merge.

@merrymercy merrymercy merged commit 084fa54 into sgl-project:main Jul 29, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants