-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: list changed size during iteration #12
Comments
Thank you! Could you check if this also happens for you without fastapi? |
Confirmed this today without fastapi. |
The following code-snippet works for me (4194304 inputs). Could you share a snippet to reproduce? from batched.aio import dynamically
@dynamically(batch_size=32, timeout_ms=10.0)
def test(items: list[str]):
return items
await test([". " * 512] * 2048 * 2048) |
Here's a quick example that immediately produces the error message I saw: from angle_emb import AnglE
import batched
from datasets import load_dataset
model = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy = 'cls').cuda()
model.encode = batched.dynamically(model.encode, batch_size = 64)
dataset = load_dataset("McAuley-Lab/Amazon-Reviews-2023", "raw_review_All_Beauty", trust_remote_code=True)
embeddings1 = model.encode(dataset["full"][:100]['text']) # works
embeddings2 = model.encode(dataset["full"][:500]['text']) # works
embeddings3 = model.encode(dataset["full"][:50000]['text']) # fails (RuntimeError: list changed size during iteration) With this example and a fresh kernel, the issue crops up pretty quickly (<20s). Without a fresh kernel, it seems one needs to adjust the exact number of items to get the error to reliably occur: Exception in thread Thread-5 (_process_batches):
Traceback (most recent call last):
File "/home/daved/miniforge3/envs/bert/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/daved/miniforge3/envs/bert/lib/python3.10/site-packages/ipykernel/ipkernel.py", line 766, in run_closure
_threading_Thread_run(self)
File "/home/daved/miniforge3/envs/bert/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/daved/miniforge3/envs/bert/lib/python3.10/site-packages/batched/batch_processor.py", line 124, in _process_batches
for batch in self.batch_queue.optimal_batches():
File "/home/daved/miniforge3/envs/bert/lib/python3.10/site-packages/batched/batch_generator.py", line 154, in optimal_batches
batch_items = [self._queue._get() for _ in range(size_batches)] # noqa: SLF001
File "/home/daved/miniforge3/envs/bert/lib/python3.10/site-packages/batched/batch_generator.py", line 154, in <listcomp>
batch_items = [self._queue._get() for _ in range(size_batches)] # noqa: SLF001
File "/home/daved/miniforge3/envs/bert/lib/python3.10/queue.py", line 239, in _get
return heappop(self.queue)
RuntimeError: list changed size during iteration |
Awesome, thank you! |
Hello Team,
Currently getting the following error(RuntimeError: list changed size during iteration) when trying to use the batched API for large amounts of documents(greater than 10K). Tested this with mixedbread-ai/mxbai-embed-xsmall-v1 model. Here is the code being used.
`import uvicorn
import batched
from fastapi import FastAPI
from fastapi.responses import ORJSONResponse
from sentence_transformers import SentenceTransformer
from pydantic import BaseModel
app = FastAPI()
model = SentenceTransformer('mixedbread-ai/mxbai-embed-xsmall-v1')
model.encode = batched.aio.dynamically(model.encode)
class EmbeddingsRequest(BaseModel):
input: str | list[str]
async def embeddings(request: EmbeddingsRequest):
return ORJSONResponse({"embeddings": await model.encode(request.input)})
if name == "main":
uvicorn.run(app, host="0.0.0.0", port=8000)`
The text was updated successfully, but these errors were encountered: