-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Queue refactor #462
Queue refactor #462
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR refactors the queue and batching system in the infinity-emb library, focusing on improved type safety, memory efficiency, and more robust error handling.
- Removed
batch_delay
parameter from BatchHandler initialization, potentially impacting throughput optimization - Changed
pop_optimal_batches
in/libs/infinity_emb/infinity_emb/inference/queue.py
to use generator pattern for better memory efficiency - Increased result queue size from 4 to 8 in
/libs/infinity_emb/infinity_emb/inference/batch_handler.py
for improved throughput - Added
QUEUE_TIMEOUT
constant (0.5s) in/libs/infinity_emb/infinity_emb/inference/batch_handler.py
for consistent timeout handling - Introduced
BaseTypeHint
in/libs/infinity_emb/infinity_emb/transformer/abstract.py
for improved type safety across transformer classes
4 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile
self._batch_handler = BatchHandler( | ||
max_batch_size=self._engine_args.batch_size, | ||
model_replicas=self._model_replicas, | ||
batch_delay=self._min_inference_t / 2, | ||
# batch_delay=self._min_inference_t / 2, | ||
vector_disk_cache_path=self._engine_args.vector_disk_cache_path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Removing batch_delay could lead to aggressive batching and potential resource exhaustion. Consider adding a configurable minimum delay or documenting why this was removed.
BaseTransformer, BaseEmbedder, BaseTIMM, BaseAudioEmbedModel, BaseClassifer, BaseCrossEncoder | ||
] | ||
|
||
|
||
def run_warmup(model, inputs) -> tuple[float, float, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: model parameter should be typed with BaseTypeHint
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #462 +/- ##
==========================================
- Coverage 79.08% 79.04% -0.04%
==========================================
Files 42 42
Lines 3414 3408 -6
==========================================
- Hits 2700 2694 -6
Misses 714 714 ☔ View full report in Codecov by Sentry. |
This pull request includes multiple changes to improve type hinting, optimize batch processing, and enhance queue handling in the
infinity_emb
library. The most important changes include the introduction of a new type hint, adjustments to queue timeouts, and modifications to batch processing logic.Type Hinting Improvements:
BaseTypeHint
to replaceBaseTransformer
in type hints, improving code clarity and flexibility. [1] [2] [3]TYPE_CHECKING
import and conditional imports of type hints to avoid runtime overhead.Queue Handling Enhancements:
QUEUE_TIMEOUT
for consistency and easier adjustments. [1] [2] [3] [4] [5] [6]_result_queue
from 4 to 8 to handle more results concurrently.Batch Processing Optimizations:
pop_optimal_batches
method by removing unnecessary sorting and returning a generator instead of a list. [1] [2] [3]latest_first
parameter from thepop_optimal_batches
call to streamline the batch selection process.Miscellaneous:
batch_delay
parameter inastart
method to potentially remove unnecessary delays in batch handling.These changes collectively aim to improve the performance, maintainability, and clarity of the
infinity_emb
library.<!--Congratulations! You've made it this far! Thanks for submitting a PR to Infinity!
License & CLA
By submitting this PR, I confirm that my contribution is made under the terms of the MIT license.
-->
Related Issue
Checklist
Additional Notes
Add any other context about the PR here.