Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue refactor #462

Merged
merged 4 commits into from
Nov 14, 2024
Merged

Queue refactor #462

merged 4 commits into from
Nov 14, 2024

Conversation

michaelfeil
Copy link
Owner

@michaelfeil michaelfeil commented Nov 14, 2024

This pull request includes multiple changes to improve type hinting, optimize batch processing, and enhance queue handling in the infinity_emb library. The most important changes include the introduction of a new type hint, adjustments to queue timeouts, and modifications to batch processing logic.

Type Hinting Improvements:

  • Introduced BaseTypeHint to replace BaseTransformer in type hints, improving code clarity and flexibility. [1] [2] [3]
  • Added TYPE_CHECKING import and conditional imports of type hints to avoid runtime overhead.

Queue Handling Enhancements:

  • Changed various queue timeouts to use a constant QUEUE_TIMEOUT for consistency and easier adjustments. [1] [2] [3] [4] [5] [6]
  • Increased the size of _result_queue from 4 to 8 to handle more results concurrently.

Batch Processing Optimizations:

  • Simplified the pop_optimal_batches method by removing unnecessary sorting and returning a generator instead of a list. [1] [2] [3]
  • Removed the latest_first parameter from the pop_optimal_batches call to streamline the batch selection process.

Miscellaneous:

  • Commented out the batch_delay parameter in astart method to potentially remove unnecessary delays in batch handling.
  • Updated import statements and removed unused imports to clean up the codebase. [1] [2]

These changes collectively aim to improve the performance, maintainability, and clarity of the infinity_emb library.<!--
Congratulations! You've made it this far! Thanks for submitting a PR to Infinity!

License & CLA

By submitting this PR, I confirm that my contribution is made under the terms of the MIT license.
-->

Related Issue

Checklist

  • I have read the CONTRIBUTING guidelines.
  • I have added tests to cover my changes.
  • I have updated the documentation (docs folder) accordingly.

Additional Notes

Add any other context about the PR here.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR refactors the queue and batching system in the infinity-emb library, focusing on improved type safety, memory efficiency, and more robust error handling.

  • Removed batch_delay parameter from BatchHandler initialization, potentially impacting throughput optimization
  • Changed pop_optimal_batches in /libs/infinity_emb/infinity_emb/inference/queue.py to use generator pattern for better memory efficiency
  • Increased result queue size from 4 to 8 in /libs/infinity_emb/infinity_emb/inference/batch_handler.py for improved throughput
  • Added QUEUE_TIMEOUT constant (0.5s) in /libs/infinity_emb/infinity_emb/inference/batch_handler.py for consistent timeout handling
  • Introduced BaseTypeHint in /libs/infinity_emb/infinity_emb/transformer/abstract.py for improved type safety across transformer classes

4 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines 88 to 92
self._batch_handler = BatchHandler(
max_batch_size=self._engine_args.batch_size,
model_replicas=self._model_replicas,
batch_delay=self._min_inference_t / 2,
# batch_delay=self._min_inference_t / 2,
vector_disk_cache_path=self._engine_args.vector_disk_cache_path,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Removing batch_delay could lead to aggressive batching and potential resource exhaustion. Consider adding a configurable minimum delay or documenting why this was removed.

BaseTransformer, BaseEmbedder, BaseTIMM, BaseAudioEmbedModel, BaseClassifer, BaseCrossEncoder
]


def run_warmup(model, inputs) -> tuple[float, float, str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: model parameter should be typed with BaseTypeHint

@codecov-commenter
Copy link

codecov-commenter commented Nov 14, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 95.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 79.04%. Comparing base (4148a21) to head (5a28b82).

Files with missing lines Patch % Lines
...finity_emb/infinity_emb/inference/batch_handler.py 91.66% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #462      +/-   ##
==========================================
- Coverage   79.08%   79.04%   -0.04%     
==========================================
  Files          42       42              
  Lines        3414     3408       -6     
==========================================
- Hits         2700     2694       -6     
  Misses        714      714              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@michaelfeil michaelfeil merged commit ae9e399 into main Nov 14, 2024
36 checks passed
@michaelfeil michaelfeil deleted the queue-refactor branch November 14, 2024 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants