Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: list changed size during iteration #12

Closed
John42506176Linux opened this issue Nov 25, 2024 · 6 comments
Closed

RuntimeError: list changed size during iteration #12

John42506176Linux opened this issue Nov 25, 2024 · 6 comments
Assignees

Comments

@John42506176Linux
Copy link

Hello Team,

Currently getting the following error(RuntimeError: list changed size during iteration) when trying to use the batched API for large amounts of documents(greater than 10K). Tested this with mixedbread-ai/mxbai-embed-xsmall-v1 model. Here is the code being used.

`import uvicorn
import batched
from fastapi import FastAPI
from fastapi.responses import ORJSONResponse
from sentence_transformers import SentenceTransformer
from pydantic import BaseModel

app = FastAPI()

model = SentenceTransformer('mixedbread-ai/mxbai-embed-xsmall-v1')
model.encode = batched.aio.dynamically(model.encode)

class EmbeddingsRequest(BaseModel):
input: str | list[str]

async def embeddings(request: EmbeddingsRequest):
return ORJSONResponse({"embeddings": await model.encode(request.input)})

if name == "main":
uvicorn.run(app, host="0.0.0.0", port=8000)`

@juliuslipp
Copy link
Contributor

Thank you! Could you check if this also happens for you without fastapi?

@juliuslipp juliuslipp self-assigned this Dec 8, 2024
@davedgd
Copy link

davedgd commented Dec 10, 2024

Confirmed this today without fastapi.

@juliuslipp
Copy link
Contributor

The following code-snippet works for me (4194304 inputs). Could you share a snippet to reproduce?

from batched.aio import dynamically

@dynamically(batch_size=32, timeout_ms=10.0)
def test(items: list[str]):
    return items
    
 await test([". " * 512] * 2048 * 2048) 

@davedgd
Copy link

davedgd commented Dec 13, 2024

Here's a quick example that immediately produces the error message I saw:

from angle_emb import AnglE
import batched
from datasets import load_dataset

model = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy = 'cls').cuda()

model.encode = batched.dynamically(model.encode, batch_size = 64)

dataset = load_dataset("McAuley-Lab/Amazon-Reviews-2023", "raw_review_All_Beauty", trust_remote_code=True)

embeddings1 = model.encode(dataset["full"][:100]['text'])   # works
embeddings2 = model.encode(dataset["full"][:500]['text'])   # works
embeddings3 = model.encode(dataset["full"][:50000]['text']) # fails (RuntimeError: list changed size during iteration)

With this example and a fresh kernel, the issue crops up pretty quickly (<20s). Without a fresh kernel, it seems one needs to adjust the exact number of items to get the error to reliably occur:

Exception in thread Thread-5 (_process_batches):
Traceback (most recent call last):
  File "/home/daved/miniforge3/envs/bert/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/daved/miniforge3/envs/bert/lib/python3.10/site-packages/ipykernel/ipkernel.py", line 766, in run_closure
    _threading_Thread_run(self)
  File "/home/daved/miniforge3/envs/bert/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/daved/miniforge3/envs/bert/lib/python3.10/site-packages/batched/batch_processor.py", line 124, in _process_batches
    for batch in self.batch_queue.optimal_batches():
  File "/home/daved/miniforge3/envs/bert/lib/python3.10/site-packages/batched/batch_generator.py", line 154, in optimal_batches
    batch_items = [self._queue._get() for _ in range(size_batches)]  # noqa: SLF001
  File "/home/daved/miniforge3/envs/bert/lib/python3.10/site-packages/batched/batch_generator.py", line 154, in <listcomp>
    batch_items = [self._queue._get() for _ in range(size_batches)]  # noqa: SLF001
  File "/home/daved/miniforge3/envs/bert/lib/python3.10/queue.py", line 239, in _get
    return heappop(self.queue)
RuntimeError: list changed size during iteration

juliuslipp added a commit that referenced this issue Dec 17, 2024
juliuslipp added a commit that referenced this issue Dec 17, 2024
@juliuslipp
Copy link
Contributor

@davedgd Thanks for letting us know :) - #15 should fix the issue.

@davedgd
Copy link

davedgd commented Dec 17, 2024

Awesome, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants