CUDA out of memory for 3k documents #247

aaraya-rr · 2024-09-13T15:00:37Z

Similar to #205, I am experiencing an issue where CUDA runs out of memory when processing 3k documents (which are actually chunks, as I am using my own splitter).

I’ve noticed in your release notes (v0.0.8 and #173) that you mention adding documents in the range of 100k-500k, which makes me curious about how you achieve that without running out of memory, given that I’m facing memory issues on a T4 GPU with 15360MiB when processing just 3k documents.

What I find interesting is that when using CUDA_VISIBLE_DEVICES="", the process works and takes a relatively short time (around 3 hours). I would like to know if you are still working on a solution for this, or if there is any way to prevent CUDA from running out of memory, as using only the CPU takes 3 hours, and with the GPU, the performance should improve significantly!

My code:

    def add_documents(self, index_name, documents, metadatas):
        RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
        RAG.index(
            collection=documents, 
            document_metadatas=metadatas,
            index_name=index_name, 
            split_documents=False
            )

Cuda Error:

PyTorch-based indexing did not succeed with error: CUDA out of memory. Tried to allocate 7.60 GiB. GPU 0 has a total capacity of 14.58 GiB of which 2.44 GiB is free. Process 16619 has 1.03 GiB memory in use. Including non-PyTorch memory, this process has 11.11 GiB memory in use. Of the allocated memory 10.87 GiB is allocated by PyTorch, and 109.76 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) ! Reverting to using FAISS and attempting again...

The text was updated successfully, but these errors were encountered:

santiagotri · 2024-10-09T08:18:16Z

Hello @aaraya-rr , same error here. I'm currently trying to create an index for around 128k documents. However I get the CUDA out of memory after around 1k documents

Procedure

My documents are stored in an Elasticsearch index, and my goal is to migrate this index into RAGatouille. I’m using the index and add_to_index functions to upload the documents in batches of approximately 200 documents each. The process involves downloading a batch from Elasticsearch, indexing it in RAGatouille, and then requesting the next batch (repeat the process until the docs are all indexed).

However, around the 4th or 5th batch, the error occurs. I also observe a curious pattern in the indexing time, which increases between batches:

1st batch: 6 seconds
2nd batch: 12 seconds
3rd batch: 18 seconds
4th batch: error

Conclusions

The behavior I described leads me to conclude that it doesn’t matter if the indexing is done in batches; most likely, the previously indexed documents are also being preprocessed, as the time appears to follow the pattern t⋅b, where t is a time constant and b is the batch number.

This could explain the memory issue. The processing may not be capable of splitting the workload effectively and tries to allocate the memory needed for all documents to be indexed.

Additionally, using CPU indexing is not an option for me, as it takes approximately 5 hours per 1,000 documents, which I estimate would amount to about 26 days for 128,000 documents. :'(

Potential fix (?)

I would appreciate any suggestions or guidance on potential fixes for this issue. Is there a way to optimize memory usage during the indexing process, or should I consider alternative approaches to avoid the CUDA out of memory error? Probably related to #205

santiagotri mentioned this issue Oct 10, 2024

add_to_index uses too much GPU RAM and crashes #205

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory for 3k documents #247

CUDA out of memory for 3k documents #247

aaraya-rr commented Sep 13, 2024 •

edited

Loading

santiagotri commented Oct 9, 2024 •

edited

Loading

CUDA out of memory for 3k documents #247

CUDA out of memory for 3k documents #247

Comments

aaraya-rr commented Sep 13, 2024 • edited Loading

santiagotri commented Oct 9, 2024 • edited Loading

Procedure

Conclusions

Potential fix (?)

aaraya-rr commented Sep 13, 2024 •

edited

Loading

santiagotri commented Oct 9, 2024 •

edited

Loading