Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Resource leak in delete_collection (Single-Node Chroma) #3296

Open
tazarov opened this issue Dec 13, 2024 · 0 comments · May be fixed by #3297
Open

[Bug]: Resource leak in delete_collection (Single-Node Chroma) #3296

tazarov opened this issue Dec 13, 2024 · 0 comments · May be fixed by #3297
Assignees
Labels
bug Something isn't working by-chroma Local Chroma An improvement to Local (single node) Chroma

Comments

@tazarov
Copy link
Contributor

tazarov commented Dec 13, 2024

What happened?

The bug causes both memory and file handles to leak infinitely.

The bug is easy to reproduce:

import os

import chromadb
import numpy as np
import uuid
from chromadb.db.system import SysDB
from chromadb.segment import SegmentType
import psutil

client = chromadb.PersistentClient("delete_resource_leak")

col = client.get_or_create_collection("delete_resource_leak")

process = psutil.Process()
open_files = process.open_files()
print(open_files)
embeddings  = np.random.uniform(-1, 1, (100, 1536))
docs = [f"doc_{i}" for i in range(100)]
ids = [f"{uuid.uuid4()}" for i in range(100)]
col.add(
    embeddings=embeddings,
    ids=ids,
    documents=docs,
)

sysdb: SysDB = client._server._sysdb  # type: ignore
segments = sysdb.get_segments(collection=col.id)
assert len(segments) == 2
vector_segment = [s for s in segments if s["type"] == SegmentType.HNSW_LOCAL_PERSISTED.value][0]
assert os.path.exists(os.path.join("delete_resource_leak", str(vector_segment["id"])))
client.delete_collection(col.name)
open_files = process.open_files()
print(open_files)
# the below will fail
assert not os.path.exists(os.path.join("delete_resource_leak", str(vector_segment["id"])))

After deletion of a collection the HNSW segment dir and the related file handles are not released. The issue is a change introduced in 0.5.21. The below diagram sums up the issue:

image

Versions

Chroma version >0.5.20

Relevant log output

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working by-chroma Local Chroma An improvement to Local (single node) Chroma
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants