-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TokenEncoder #838
TokenEncoder #838
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,25 @@ | ||
from jinja2 import Environment, StrictUndefined | ||
from tiktoken import encoding_for_model, get_encoding | ||
|
||
from pr_agent.config_loader import get_settings | ||
from threading import Lock | ||
|
||
|
||
class TokenEncoder: | ||
_encoder_instance = None | ||
_model = None | ||
_lock = Lock() # Create a lock object | ||
|
||
Comment on lines
+7
to
11
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. /ask is there an alternatie for lock ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, there are alternatives to using a
Each of these synchronization primitives has its own use cases and choosing the right one depends on the specific requirements of your application. For instance, if you need a simple mutual exclusion, a |
||
@classmethod | ||
def get_token_encoder(cls): | ||
model = get_settings().config.model | ||
if cls._encoder_instance is None or model != cls._model: # Check without acquiring the lock for performance | ||
with cls._lock: # Lock acquisition to ensure thread safety | ||
if cls._encoder_instance is None or model != cls._model: | ||
cls._model = model | ||
cls._encoder_instance = encoding_for_model(cls._model) if "gpt" in cls._model else get_encoding( | ||
"cl100k_base") | ||
return cls._encoder_instance | ||
|
||
def get_token_encoder(): | ||
return encoding_for_model(get_settings().config.model) if "gpt" in get_settings().config.model else get_encoding( | ||
"cl100k_base") | ||
|
||
class TokenHandler: | ||
""" | ||
|
@@ -31,7 +44,7 @@ def __init__(self, pr=None, vars: dict = {}, system="", user=""): | |
- system: The system string. | ||
- user: The user string. | ||
""" | ||
self.encoder = get_token_encoder() | ||
self.encoder = TokenEncoder.get_token_encoder() | ||
if pr is not None: | ||
self.prompt_tokens = self._get_system_user_tokens(pr, self.encoder, vars, system, user) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes walkthrough
token_handler.py (+18/-5)
Implement Singleton Pattern for Token Encoding
pr_agent/algo/token_handler.py
TokenEncoder
singleton class for thread-safe tokenencoding.
of the encoder.
TokenHandler
to useTokenEncoder.get_token_encoder()
method.utils.py (+2/-2)
Update Utils to Use TokenEncoder Singleton
pr_agent/algo/utils.py
get_token_encoder
withTokenEncoder.get_token_encoder()
for consistency.