-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to avoid the long time waiting before start training? #3133
Comments
Hello! I'm sorry to hear that you've been getting such long delays. I'd like to make something clear: this is not the expected behaviour.
The low-but-not-none GPU utilization could mean that prior to training, you are running a very large evaluator (which don't log anything unless you set your logging level to INFO). A lot of the training scripts run an evaluator prior to training, because the evaluated results will be automatically included in the model card, so you can easily see the effects from before training VS after training. Beyond that, before training officially starts, the My recommendations:
Those are my ideas at this time, perhaps they can help you get over this. I do feel like perhaps it's as simple as "slow CPUs" though! Or perhaps something with Linux? Considering I've not encountered this myself.
|
Thank you very much. After profiling, I found that it was the TripletEvaluator that cause the long waiting time to start training. After removing the TripletEvaluator, it started training immediately. Thank you! |
I'm very happy to hear that you got it working! It got me a bit concerned 😆
|
@tomaarsen Dear Tom, Just a quick question. Although the TripletEvaluator time was avoided, the As far as I understand, the But what i saw was that at the begining, it used 100% CPU to mapping all the datasets, at this time, GPU did nothing. After dataset mapping was done (30 minutes for small datasets, hours for big datasets), CPU utilization was down to 10%, and GPU started training with utilization as high as 80%. Why was that? How to parallel the |
Dear developer,
Thanks for the great sentence-transformers library!
I am finetuning the sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 using my own data following the tutorial from: https://sbert.net/docs/sentence_transformer/training_overview.html
I first finetuned it with a toy dataset containing only hundreds of triplet sentence samples, and everything was ok, and the finetuning was very fast.
After that, I finetuned it with the formal big dataset containing 100 million triplet sentence samples. I found that it had to wait a long time (about 60 minutes) to start training. And when the data is bigger, the waiting time is longer.
Specifically:
Generating train split
.Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
.During the 60 minutes, I found that the GPU was working but the GPU utilization rate was relatively low (30%) and the GPU memory was not used. What's more, during the 60 minutes, no any log information was printed. Was it doing something like data preparation or tokenization? Could you tell me what was it doing, and how to avoid this long waiting time?
After the 60-minute waiting, it started the real training, and the GPU utilization rate was as high as 80%, and the GPU memory was used around 70GB on H100. What's more, the training progress bar was printing similar as
x/y [69:08:34<130:13:54, 1.09it/s]
. So that I knew it was training.I also have another dataset which is 10 times larger than 100 million triplet sentence samples, I worry that I have to wait days to starting the training if I use the huge dataset.
Could you tell me what was it doing during the 60-minute waiting, and how to avoid this long waiting time?
Thank you very much and look forward to your reply.
The text was updated successfully, but these errors were encountered: