You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use 1.4.1 version of T2T with 2 * 1080ti and BS of 2048. When I trained translate_ende_wmt32k with big_single_gpu model for ~ 600k steps, I got a BLEU score of 26.02, which is only ~2 BLEUs less than reported in the "Attention is all you need" paper. But when I use the same set-up and parameters to train translate_enfr_wmt32k model - I get around 33 BLEU after 1.4mln steps, which is whole 9 BLEU points less than result in the paper. Which seems somewhat too much to be compensated with number of GPUs. In gitter @lukaszkaiser assumed that it might be a tokenization issue. I'm not sure what's the issue at this point, but results look somewhat suspicious to me. Hope you could advise what might be happening here.