Improved Precision and Recall Metric for Assessing Generative Models - Unofficial Pytorch Implementation
- given two directories containing real and fake images
python improved_precision_recall.py [path_real] [path_fake]
- pre-compute real manifold and save to a file
python improved_precision_recall.py [path_real] [dummy_str] --fname_precalc [filename_dest]
- if the images are already on memory
ipr = IPR(args.batch_size, args.k, args.num_samples)
ipr.compute_manifold_ref(args.path_real) # args.path_real can be either directory or pre-computed manifold file
metric = ipr.precision_and_recall(images)
print('precision =', metric.precision)
print('recall =', metric.recall)
- realism score
realism_score = ipr.realism(image_in_tensor)
-
Corner case
- For A = {999 samples from uniform(0,1)} + {2} and B = {999 samples from uniform(2,3)} + {1},
precision = 1 and recall = 1. - Outliers can be handled by estimating the quality of individual samples and pruning out.
- For A = {999 samples from uniform(0,1)} + {2} and B = {999 samples from uniform(2,3)} + {1},
-
Number of samples
- For A = 1000 real images from celeba_hq and B = 4 images among A,
precision = 1 and recall = 0.638.
Wow, 4 images cover 64% of 1000 images! - Manifold estimate becomes inaccurate when number of samples is small.
- For A = 1000 real images from celeba_hq and B = 4 images among A,
-
Not getting close to 1 given two sets of real images
- For A = 1000 real images from celeba_hq and B = another 1000 real images from the same dataset,
precision = 0.639 and recall = 0.661.
They are not close to 1 even though both set are sampled from the same distribution (=dataset). - It happens because image data in general is extremely sparse.
- For A = 1000 real images from celeba_hq and B = another 1000 real images from the same dataset,
We thank Tuomas for enjoyable discussion.
https://github.com/kynkaat/improved-precision-and-recall-metric