The idea of this work is to show the improvements in a VAE network caused by the use of GELUs, replacing the LReLU used in the base model. GELUs were introduced in the work of Hendricks D. - Gaussian Error Linear Units (GELUs) (2016), corresponding to better results in computer vision tasks, natural language processing and automatic speech recognition compared to models with ReLUs or ELUs.
The code is based on the approach taken by Foster at github.com/davidADSP/GDL_code.
1. The encoder network compresses high-dimensional input data into high-dimensional data lower dimension.
2. The decoder network decompresses the low-dimensional representation, reconstructing the input data.
Foster (2019) explains that the autoencoder network is trained to find weights that minimize the loss between the original input and the input reconstruction. The representation vector (representation vector) shown in Figure 13 demonstrates the compression of the input image in a smaller dimension called latent space, and it is from there that the decoder starts the reconstruction to obtain the input image. By choosing a point in the latent space represented to the right of the image, the decoder should be able to generate images within the distribution of the original data. However, we can notice that, depending on the point chosen in this two-dimensional latent space, the decoder will not be able to generate the images correctly. There is also a problem of lack of symmetry, which we notice by looking at the y axes of latent space and we see that the number of points in 𝑦 < 0 is much greater than in 𝑦 > 0 and there is a large concentration at the point (0, 0). Finally, through the coloring, we noticed that some digits are represented in very small and overlapping areas.
In addition to the aforementioned problems, the decoder must be able to generate different types of digits. According to Foster (2019), if the autoencoder is too free to choose how it will use the latent space to encode the images, there will be huge gaps between groups of similar points without these spaces between the numbers being able to generate images correctly. The Variational Autoencoder is a model that can be used to solve these problems demonstrated in an autoencoder to become a generative model. In an autoencoder, each image is mapped directly as a point in latent space, while in a VAE, each image is mapped as a multivariate normal distribution around a point.
The encoder only cares about mapping the input to a mean vector and a variance vector, not worrying about the covariance (numerical interdependence between two random variables) between the dimensions. As the output of the neural network can be any real number in the range (−∞, ∞), it is preferable to map the variance logarithm (FOSTER, 2019).
In our experiment, we did tests with the base model using LReLU, a model replacing the LReLUs by GELUs in the encoder, a model doing the same replacement only in the decoder and a last model replacing both the encoder and the decoder (full). Left are reconstruction loss curves and right are KL loss curves. Light, thin curves correspond to test set log losses. In the last figure, the general loss of VAE.
We can observe that replacing the activation layers only in the encoder or decoder already obtains a better performance than the base model. It is important to note that when using GELUs in the encoder, there was an improvement compared to using them in the decoder. Possibly due to the behavior of the decoder input which tends to be similar to a normal distribution due to the regularization of the encoder through the Kullback-Leibler Divergence, inserting the data representations as a normal distribution in the latent space. In the Full-GELU model, in which we replaced all LReLUs with GELUs, we noticed a significant improvement in relation to the base model and also to the others.
Perhaps the dropout rate may have influenced a deterioration in relation to the Full-GELU model. Therefore, in the future we will test different dropout values that maintain the overfitting improvement and optimize the result of the Full-GELU model.
FOSTER, D. Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play. 1ª Edição. O’Reilly Media: Michele Cronin, 2019.
HENDRYCKS, Dan; GIMPEL, Kevin. Gaussian error linear units (GELUs). arXiv preprint arXiv:1606.08415, 2016.