Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory for 3k documents #247

Open
aaraya-rr opened this issue Sep 13, 2024 · 1 comment
Open

CUDA out of memory for 3k documents #247

aaraya-rr opened this issue Sep 13, 2024 · 1 comment

Comments

@aaraya-rr
Copy link

aaraya-rr commented Sep 13, 2024

Similar to #205, I am experiencing an issue where CUDA runs out of memory when processing 3k documents (which are actually chunks, as I am using my own splitter).

I’ve noticed in your release notes (v0.0.8 and #173) that you mention adding documents in the range of 100k-500k, which makes me curious about how you achieve that without running out of memory, given that I’m facing memory issues on a T4 GPU with 15360MiB when processing just 3k documents.

What I find interesting is that when using CUDA_VISIBLE_DEVICES="", the process works and takes a relatively short time (around 3 hours). I would like to know if you are still working on a solution for this, or if there is any way to prevent CUDA from running out of memory, as using only the CPU takes 3 hours, and with the GPU, the performance should improve significantly!

My code:

    def add_documents(self, index_name, documents, metadatas):
        RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
        RAG.index(
            collection=documents, 
            document_metadatas=metadatas,
            index_name=index_name, 
            split_documents=False
            )

Cuda Error:

PyTorch-based indexing did not succeed with error: CUDA out of memory. Tried to allocate 7.60 GiB. GPU 0 has a total capacity of 14.58 GiB of which 2.44 GiB is free. Process 16619 has 1.03 GiB memory in use. Including non-PyTorch memory, this process has 11.11 GiB memory in use. Of the allocated memory 10.87 GiB is allocated by PyTorch, and 109.76 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) ! Reverting to using FAISS and attempting again...
@santiagotri
Copy link

santiagotri commented Oct 9, 2024

Hello @aaraya-rr , same error here. I'm currently trying to create an index for around 128k documents. However I get the CUDA out of memory after around 1k documents

Procedure

My documents are stored in an Elasticsearch index, and my goal is to migrate this index into RAGatouille. I’m using the index and add_to_index functions to upload the documents in batches of approximately 200 documents each. The process involves downloading a batch from Elasticsearch, indexing it in RAGatouille, and then requesting the next batch (repeat the process until the docs are all indexed).

However, around the 4th or 5th batch, the error occurs. I also observe a curious pattern in the indexing time, which increases between batches:

  • 1st batch: 6 seconds
  • 2nd batch: 12 seconds
  • 3rd batch: 18 seconds
  • 4th batch: error

Conclusions

The behavior I described leads me to conclude that it doesn’t matter if the indexing is done in batches; most likely, the previously indexed documents are also being preprocessed, as the time appears to follow the pattern t⋅b, where t is a time constant and b is the batch number.

This could explain the memory issue. The processing may not be capable of splitting the workload effectively and tries to allocate the memory needed for all documents to be indexed.

Additionally, using CPU indexing is not an option for me, as it takes approximately 5 hours per 1,000 documents, which I estimate would amount to about 26 days for 128,000 documents. :'(

Potential fix (?)

I would appreciate any suggestions or guidance on potential fixes for this issue. Is there a way to optimize memory usage during the indexing process, or should I consider alternative approaches to avoid the CUDA out of memory error? Probably related to #205

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants