Skip to content

Commit

Permalink
Reduce GPU memory utilization to make sure OOM doesn't happen (#153)
Browse files Browse the repository at this point in the history
  • Loading branch information
zhuohan123 authored Jun 18, 2023
1 parent bec7b2d commit bf5f121
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion vllm/engine/arg_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ class EngineArgs:
tensor_parallel_size: int = 1
block_size: int = 16
swap_space: int = 4 # GiB
gpu_memory_utilization: float = 0.95
gpu_memory_utilization: float = 0.90
max_num_batched_tokens: int = 2560
max_num_seqs: int = 256
disable_log_stats: bool = False
Expand Down

0 comments on commit bf5f121

Please sign in to comment.