- What does Attention in Neural Machine Translation Pay Attention to? paper
Some interesting findings, and more details in the paper:
~ only 54% attention to alignment points, which NUM 73%, NOUN 68%, VERB just 49%, and PRT(Particle, such as ’s, off, up) just 36%.
~ Attention acc on alignments are high for NOUN, and very low for the VERB, but their target losses are about the same.
- REGULARIZING NEURAL NETWORKS BY PENALIZING CONFIDENT OUTPUT DISTRIBUTIONS paper
1.Label smoothing;
2.Confidence penalty
- R-Drop: Regularized Dropout for Neural Networks paper add KL divergence between the results from two times of dropout