forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Core] Support thread-based async tokenizer pools
vllm-project#2879 added support for using ray to offload tokenization from the asyncio event loop. This PR extends that to support using a thread pool instead of ray, and makes that the default, with the default pool size determined based on the number of available CPU cores and the tensor parallel size. The main thing to note is that separate tokenizer instances are used per thread. This is because officially the HF tokenizers are not thread-safe. In practice I think they are unless you're making use of padding/truncation, which we aren't currently but may want to soon (see for example vllm-project#3144). Also includes some type hint additions to related parts of the code. This replaces the original PR vllm-project#3206 from before vllm-project#2879 was reworked and merged.
- Loading branch information
Showing
13 changed files
with
152 additions
and
69 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
37 changes: 37 additions & 0 deletions
37
vllm/transformers_utils/tokenizer_group/thread_tokenizer_group.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
import threading | ||
from concurrent.futures import ThreadPoolExecutor | ||
|
||
from vllm.logger import init_logger | ||
from vllm.transformers_utils.tokenizer_group.tokenizer_group import ( | ||
TokenizerGroup) | ||
from vllm.utils import make_async | ||
|
||
logger = init_logger(__name__) | ||
|
||
|
||
class ThreadPoolTokenizerGroup(TokenizerGroup): | ||
"""A threadpool of TokenizerGroups for async tokenization.""" | ||
|
||
def __init__(self, *args, max_workers: int, **tokenizer_config): | ||
super().__init__(*args, **tokenizer_config) | ||
self.local = threading.local() | ||
|
||
def init_tokenizer(): | ||
logger.info( | ||
f"Starting tokenizer thread {threading.current_thread().name}") | ||
self.local.tokenizer = TokenizerGroup(*args, **tokenizer_config) | ||
|
||
self.executor = ThreadPoolExecutor( | ||
max_workers=max_workers, | ||
thread_name_prefix='tokenizer_thread', | ||
initializer=init_tokenizer, | ||
) | ||
|
||
self.encode_async = make_async(self._encode_local, self.executor) | ||
|
||
def _encode_local(self, *args, **kwargs): | ||
return self.local.tokenizer.encode(*args, **kwargs) | ||
|
||
def encode(self, *args, **kwargs): | ||
return self.executor.submit(self._encode_local, *args, | ||
**kwargs).result() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.