[FIX]avoid initialize process group when using a single GPU #2496

jeejeelee · 2024-01-18T17:41:07Z

simon-mo · 2024-01-18T18:54:00Z

This is great! please add a test and let me know when it is ready for review!

jeejeelee · 2024-01-19T10:39:41Z

@simon-mo , I have completed the coding and tested using an example similar to the one below, could you please take a little time to review this PR?

from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

# Create an LLM.
llm0 = LLM(model="facebook/opt-125m",trust_remote_code=True,gpu_memory_utilization=0.3)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm0.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

# Create an LLM.
llm1 = LLM(model="facebook/opt-125m",trust_remote_code=True,gpu_memory_utilization=0.5)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm1.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

PS:in my local host , the LLM is chatglm3.

jeejeelee · 2024-01-23T16:40:50Z

@simon-mo I apologize to disturb you，I want to know what should I do next

simon-mo · 2024-01-23T22:22:32Z

Thanks for the ping. hmm but the final approach is not what I had in mind. Is monkey patching necessary here? Looks like we can just change the functions inside vLLM to achieve what you wanted.

jeejeelee · 2024-01-24T02:32:10Z

@simon-mo Thanks for your review. In my approach, monkey patching is necessary.

The reason for using this approach instead of changing the functions inside vLLM is to avoid modifying too much original code (such as :Work._init_distributed_environment and get_tensor_model_parallel_rank in parallel_state.py). By using monkey patching, the main modification can be restricted to _init_single_gpu_config.

Of course, these are my personal understandings. I believe you might have better methods and deeper considerations. Please let me know.

simon-mo · 2024-01-24T02:45:10Z

Please directly modify the vLLM code so the code in the end is simple and maintainable.

jeejeelee · 2024-01-24T15:09:23Z

@simon-mo Thanks for your review. Considering that you don't agree with my approach, I will close this PR for now. I will think about how to directly modify the vLLM code directly.

simon-mo · 2024-01-25T01:14:26Z

Thank you for understanding!

jeejeelee added 2 commits January 19, 2024 01:36

complete coding

5621569

modify offline_inference.py

4219d19

simon-mo self-assigned this Jan 18, 2024

jeejeelee added 3 commits January 19, 2024 14:36

refactor code

999dda2

format code

a5f9036

refactor code and add comment

b2eedc0

jeejeelee changed the title ~~[WIP]avoid initialize process group when using a single GPU~~ [FIX]avoid initialize process group when using a single GPU Jan 19, 2024

simon-mo self-requested a review January 19, 2024 16:44

jeejeelee closed this Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX]avoid initialize process group when using a single GPU #2496

[FIX]avoid initialize process group when using a single GPU #2496

jeejeelee commented Jan 18, 2024 •

edited

Loading

simon-mo commented Jan 18, 2024

jeejeelee commented Jan 19, 2024

jeejeelee commented Jan 23, 2024

simon-mo commented Jan 23, 2024

jeejeelee commented Jan 24, 2024

simon-mo commented Jan 24, 2024

jeejeelee commented Jan 24, 2024

simon-mo commented Jan 25, 2024

[FIX]avoid initialize process group when using a single GPU #2496

[FIX]avoid initialize process group when using a single GPU #2496

Conversation

jeejeelee commented Jan 18, 2024 • edited Loading

simon-mo commented Jan 18, 2024

jeejeelee commented Jan 19, 2024

jeejeelee commented Jan 23, 2024

simon-mo commented Jan 23, 2024

jeejeelee commented Jan 24, 2024

simon-mo commented Jan 24, 2024

jeejeelee commented Jan 24, 2024

simon-mo commented Jan 25, 2024

jeejeelee commented Jan 18, 2024 •

edited

Loading