-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Issue with EnFr task. Maybe a tokenization problem? #518
Comments
I have no experience with enfr, so cannot really help. It would be nice to create a wiki page for reporting replications of the ende and enfr results with various T2T versions and various hyperparams (esp. number of GPUs, batch size). That said: enfr has 8 times bigger training data than ende. More GPUs result in higher effective batch size and this may influence (improve) the results esp. in later stages. |
if you have only 2 GPU, run the base model with 2x4096, you'll get better results. |
On GTX 1080 Ti with 11GB memory, transformer_big_single_gpu and recent version of T2T, I can use at most batch_size=2000 (2050 fails with OOM). I guess this may slightly differ depending on the maximum sentence length in the training data (so 2048 is possible for some datasets, but not much more). |
Hi, I try to use transformer_enfr_big setting to run wmt_enfr32k data. On M40 with 24GB memory, it only supports 2000 batch size, I train the model with 8M40x2000.... After training about 90000 steps, the bleu is only 27.x, which is far away from the reported result. That's so hard to reproduce the wmt_enfr32k result. |
I have one more tip how to increase the batch size: set the |
@vince62s , would you please kindly share your training setting and evaluation details? What is the hparams setting, training steps, batch size, test raw data or tokenized data, test with multi-bleu.perl? I have tried several times to run transformer_big_enfr and transformer_big setting, but they only got nearly 32.x bleu score, which is far low then the paper, also your 38 result. I am so confused.....Could you please help a little? Appreciate a lot. |
For this one, I did nothing special, but I have 4 GPU on which I can fit a batch of 4096 on each. if you are on 1 GPU you can't expext good results on the big after 90K steps. Start with the base. |
@vince62s Do you mean you use transformer_base() setting? Actually, I run the transformer_big exp on 8 GPUs with batch size 5500. But more than 100k, I still got low BLEU score. Would you kindly may share the training script? Thanks. |
Is there any progress on reproduce En-Fr? |
I report the result of “BLEU_uncased = 35.53 |
in fairseg, 2 epoches,report the result: |
I use 1.4.1 version of T2T with 2 * 1080ti and BS of 2048.
When I trained translate_ende_wmt32k with big_single_gpu model for ~ 600k steps, I got a BLEU score of 26.02, which is only ~2 BLEUs less than reported in the "Attention is all you need" paper.
But when I use the same set-up and parameters to train translate_enfr_wmt32k model - I get around 33 BLEU after 1.4mln steps, which is whole 9 BLEU points less than result in the paper. Which seems somewhat too much to be compensated with number of GPUs.
In gitter @lukaszkaiser assumed that it might be a tokenization issue.
I'm not sure what's the issue at this point, but results look somewhat suspicious to me.
Hope you could advise what might be happening here.
The text was updated successfully, but these errors were encountered: