Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Process large docs in batches #7

Merged
merged 3 commits into from
May 28, 2024
Merged

Conversation

ashraq1455
Copy link
Contributor

@ashraq1455 ashraq1455 commented May 21, 2024

  • Separated the chunking logic in StatisticalChunker into the _chunk method and added batch processing to reduce memory usage when handling large documents with many splits."
  • Added an enforce_max_tokens parameter to the _chunk method to ensure that splits stay within the max_split_tokens limit when using the _chunk method outside the __call__ method.
  • Fix Large document fix #6

Copy link

Failed to generate code suggestions for PR

@ashraq1455 ashraq1455 marked this pull request as ready for review May 26, 2024 09:02
@ashraq1455 ashraq1455 requested a review from jamescalam May 26, 2024 09:03
@jamescalam jamescalam merged commit 0cd1186 into main May 28, 2024
8 checks passed
@jamescalam jamescalam deleted the stat-chunker-batch-fix branch May 28, 2024 04:35
@ashraq1455 ashraq1455 mentioned this pull request Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Large document fix
2 participants