Successor to the original Synthetic Line Art (SYNLA) Dataset.
Improvements:
- Huge dataset (~10GB), 65536 high-quality 256x256 color images
- Color gradients for lines and background
- Improved data augmentation
- Linear/correct color blending
- Better resampling and reduced artifacts
- Contains real images as background (DIV2K + random anime images)
The background source images used to generate this dataset may or may not be copyrighted, however their use are justfied by:
Canadian Copyright Act (R.S.C., 1985, c. C-42), 29 - Fair dealing for the purpose of research, private study, education, parody or satire does not infringe copyright. (Country of issue)
U.S. Code Title 17. - COPYRIGHTS (17 U.S. Code § 107), [...] the fair use of a copyrighted work, [...] for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. (Country of provider)
- The purpose of use is for nonprofit educational/research purposes;
- Original images cannot be recovered without significant effort and redrawing from this dataset, which makes it nonrepresentative of the original work.
- Effort is made to use the least amount of each copyrighted work as possible. The goal is to have a large variance on the images' content, thus a very small amount of many individual work was used.
- There are no public alternatives for high quality line art datasets.
- It is easier to distribute the original work as intact images rather than distributing them within this dataset. The impact of the dataset's distribution on the original work is minimal and the dataset does not facilitate/promote unauthorized distribution of originals.
- As the dataset obfuscates large amounts of the original images, negative financial/market impact on the artist/creator is minimal.
This dataset is designed to simulate complex line art. Useful for training machine learning models which perform any of the following:
- Super-Resolution/Deblurring
- Denoising
- Artifact removal (de-ringing, non-gaussian degradation, etc.)
- Inpainting
- User-Guided Colorization
- Style Transfer
- And more...
Most line art are licensed and have copyright. Using private datasets discourages reproducibility of results. This dataset offers an open alternative and is released under MIT license.
Three color datasets are available. The full dataset contains 65536 (2^16) images of size 256x256. All images were generated using images in the folder /Generator_Images
, which is also public, allowing custom generation.
Smaller preview datasets (1024 and 4096 images) are also available. They are mutually exclusive with the full dataset can be used as validation/test datasets.
The code used to generate the images is not yet public.