-
Notifications
You must be signed in to change notification settings - Fork 27
Correct Integration of TT-Embedding to DLRM #10
Comments
Hi @TimJZ,
Hence the shape of the three tensor cores would be: Specifying
|
Thank you very much for your response! I've tried it with the parameters you mentioned but I'm still getting the same error. I'm thinking this is a version-specific error. Could you please tell me which version of DLRM were you using when testing TT-Embedding? Thanks! |
For the latest version of @facebookresearch/DLRM (1302c71624fa9dbe7f0c75fea719d5e58d33e059), this patch made it work for me:
And I ran it with a command like this: Note that this makes all embeddings TTEmbedding, you can make some of them TT by changing the |
Hi, I have been facing the "RuntimeError: CUDA error: an illegal memory access was encountered" error as well. I have tried it with PyTorch==1.8.0 with cuda 11.0 and 10.2, both of them have resulted in the same error:
Here is the command I used: I was printing the content of sparse_index_group_batch, sparse_offset_group_batch, and embedding output to see what could be the possible issue, I observed that error occurs when the current batch has lots of zero values in the sparse_index_group_batch tensor. Not sure if related, but wanted to mention it in case it helps. I would really appreciate if you could help me find out what could be the possible issue. Thanks |
Update: The illegal memory access issue was being caused by smaller embedding tables which had very low number of entries. By applying TTEmbedding only to bigger embeddings, I was able to complete apply_emb function. However, I am now facing a new issue when calling the torch.cuda.synchronize() function: Would it be possible to redirect me on how to reproduce the exact results included in the paper for the Terabyte dataset? Thanks |
Thank you very much for your reply!
Could you please give me some insights on what might go wrong? I'm using pytorch=1.6.0a0+9907a3e and cuda =11.0.167 Since in pytorch 1.6, there's no approriate API for Thanks! |
@TimJZ it looks like you are using two devices. We only tested DRLM on a single GPU so far, it should fit on a single device with 16GB memory when training DLRM with Terabyte and Kaggle datasets. Can you try running on a single device (i.e. by setting |
I've tried it on single GPU, but I'm constantly getting illegal memory access error after the for loop runs 6 times:
The GPU I'm using is Tesla V100-SXM2 with 32 GB of memory |
If you have —mlperf-logging in your arguments, remove it. I was facing the same issue and it seems to being caused by enabling MLPerf logging. |
I actually did not use mlperf-logging, but thanks for the feedback! I'm wondering if it's because I was using the mlperf-binloader. |
@latifisalar @bilgeacun |
Hi, I'm currently trying to integrate the TT-Embedding to the original DLRM code base, and I've successfully reproduced the result shown in readme. However, I'm not quite sure what are the essential changes to make.
Right now I'm replacing the original embeddingbag function (within the create_emb in dlrm_s_pytorch.py file) in DLRM with TTEmbeddingBag, but have trouble figuring out the correct parameters for the function. The parameters I used right now is:
I left the tt_p_shapes and tt_q_shapes to blank since each layer's embedding dimension and number of embeddings are different.
The paper mentioned that the TT-Rank used was [8, 16, 32, 64], but I wasn't able to use that parameter, since it would result in failure of passing assertion
len(self.tt_p_shapes) <= 4
. Therefore I used the same parameters in example ([12,14]).And that result a CUDA illegal memory access error at line 174 in tt_embedding_ops. Full error message is attached below:
Thanks!
The text was updated successfully, but these errors were encountered: