Training transformer model goes from score 0.97 to ZERO #12301
Replies: 2 comments 1 reply
-
Hi @mbrunecky! Thanks for the report, that's definitely unexpected & unideal :/ One thing I noticed looking at the training log, is that there seems to be quite a bit of fluctuation/variation in your data - e.g. the loss also variates a lot between iterations 1600 and 4800 for instance. Nevertheless, the training score increases nicely until the sudden jump at 11200. Out of curiosity, what do the model predictions of the Some more poking we could do to try and understand what is happening, is if you could run this with a random subsample of your data (maybe a few times with different sets) to see whether this keeps occurring or not. This just to verify whether there might be a few "bad samples" messing up things. Alternatively, a lower learning rate might make this more stable. Either way - we'd love to get to the bottom of this! |
Beta Was this translation helpful? Give feedback.
-
I'm transferring this to the issue tracker because it does feel like a bug. |
Beta Was this translation helpful? Give feedback.
-
I am training NER using transformer model.
On one of my data sets, during epoch 2, the score reaches 0.97 and then (after a huge loss) drops to ZERO, where it stays until the process dies with an out-of-memory error.
What I should I be looking for as the reason for this behavior?
Configuration:
Beta Was this translation helpful? Give feedback.
All reactions