With every new generation of Pokémon, a whole slew of new species are introduced to the game, amounting to over 800 Pokémon species to-date! Wouldn't it be cool if we could use these Pokémon and train a model to generate new Pokémon for us?
PokéGAN is a GAN (Generative Adversarial Network) trained on images obtained from this repo. It involves 2 competing neural networks, a generator and a discriminator.
The generator takes in as input a random variable drawn from a simple distribution (a Gaussian distribution, in our case) and is trained to output a sample from a given target distribution (i.e. Pokémon sprites). The discriminator functions as a classifier. Given real images from our dataset or fake images generated by the generator, the discriminator's task is to classify them correctly as real or fake.
The two models engage in a game where the generator tries to fool the discriminator while the discriminator has to correctly differentiate real from fake images. This minimax game pushes the generator to generate images that resemble those drawn from the target distribution (i.e. generate new Pokémon!).
My first iteration is based off of Tensorflow's tutorial. I incorporated some GAN tricks listed such as:
- Normalizing images to [-1, 1]
- Use tanh as the last layer of the generator (outputs in the range [-1, 1])
- Sample from Gaussian distrubution rather than Uniform
- Separate batches of real and fake images
- Use Leaky ReLU in both generator and discriminator
- Label smoothing: instead of hard labels (1 = real, 0 = fake), use random numbers between [0.7, 1.0] for real images and [0.0, 0.3] for fake images
- Noisy labels: Occasionally flip labels for discriminator (i.e. with 5% probability)
- Use ADAM optimizer
- Use Dropouts (i.e. 50% in generators)
This model worked really well on 28x28 images. Here's a gif of the generated images produced during training:
However, the sprites produced are quite small and the details are not too clear. This was because the images in the dataset consisted of white space around the sprites. I preprocessed the images using Wand to remove the whitespace while maintaining 128x128 dimensions. This provides a more consistent view of the sprites. The generated images now look like this:
However, we see mode collapse occurring. So, I looked to newer models, such as Progressive GAN and StyleGAN, to generate higher quality images. I settled for Progressive GAN first because it was simpler and was what StyleGAN was based off of. I experimented with a mix of DCGAN and Progressive GAN without the fade-in layers. With 3 layers, 32x32 → 68x68 → 128x128, here's what it looked like:
Here are a few things I've experimented with, with no perfect results yet:
- Train only on a certain type of Pokémon.
- Spectral normalization in DCGAN
- cDCGAN: conditional DCGAN using Pokémon types as labels with binary discriminator output (slow convergence)
- Optimize gen/dis learning rate of 1:2 (i.e. generator = 1e-4, discriminator=2e-4) source
- Cycle between resolutions: 64x64 (50 epochs) -> 128x128 (200 epochs) -> 64x64 (50 epochs) -> ...
- Lower res learns outline, higher res learns details
- Higher learning rate for low res
- Average results, good outline, no detail. Train more epochs at high res? The main reason for these techniques not yielding results is lack of training time as I'm training on Google Colab with 12hr GPU limits.
Currently, the app uses Progressive GAN adapted from here with equalized learning rate. It is also trained on only grass-type Pokémon.
- StyleGAN
- Split dataset by types or colours: conditional StyleGAN/ProgressiveGAN?
- More filters == sharper image? Bigger kernels == smoother images? Source
- ProgGAN might need more epochs (paper recommends 800k samples per layer)
- cDCGAN: conditional DCGAN using labels (i.e. Pokémon types)
- N+1 classes