I love manga, but can't read Japanese. And Google Translate isn't so great with Japanese text localization and doesn't offer a free solution for OCR+translation. So I decided to build something that'll help me translate the manga more efficiently into English. Additionally, the technology to detect the speech bubbles could also help official translators translate manga faster. Sadly I couldn't find a dataset which was free and publicly available to train my speech bubble detector on so I made this.
This repository contains the associated files and links to create an artificial manga panel dataset.
Here's a sample of an image created with this code:
If you just want to use the dataset and not change anything you can find it at
If you'd like to change the way the creator works, make your own files or contribute to the project pleae follow these instructions
- Libraqm is required for rendering CJK text properly. Follow instructions here
pip3 install -r requirements.txt
You can get the base materials for the dataset by emailing me for the key and then running:
export GOOGLE_APPLICATION_CREDENTIALS=config/ampd_key.json
dvc pull
- Base materials here: https://www.kaggle.com/aasimsani/ampd-base just create a
datasets/
folder and place the contents of the Kaggle repo in it. - In case you want to modify individual scripts for scraping or cleaning this downloaded data you can find them in
main.py
- Before you start just run
python3 main.py --run_tests
to make sure you have all the libraries installed and things are working fine - Now you can run
python3 main.py --generate_pages N
to make pages - You can also run the metadta generation
python3 main.py --create_page_metadata N
and the page renderingpython3 main.py --render_pages
seperately. The render pages call will read thedatasets/page_metadata/
folder to find files to render. - You can modify
preprocessing/config_file.py
to change how the generator works to render various parts of the page
Steps:
- Find relevant japanese dialogue dataset
- Find manga like japanese fonts
- Find different text bubble types
- Find manga images or other black and white images to use to fill panels
- Create a few manga page layout templates
- Create manga panels by combining the above elements
- Create font transformations
- Replace layout templates with manga panel generator
- Upload dataset to Kaggle
- Create a custom speech bubble creator (Reach goal)
- 196 fonts with >80% character coverage
- 91 unique speech bubble types
- 2,801,388 sentence pairs in Japanese and English
- 337,039 illustration
- Downloaded JESC dataset to get sentence pairs of English and Japanese
- Found fonts from fonts website mentioned below
- Downloaded Tagged Anime Illustrations dataset from Kaggle
- Found and created different types of speech bubbles. Tagged which parts you can render text within.
- Verified which fonts were viable and could cover at least 80% of the characters in the JESC dataset
- Converted all the images to black and white
- Created default layout set/layouting engine to create pages
- Create metadata for these pages from the layouting engine and populate each panel with:
- What image the panel comprises of
- What textbubble is associated with it and it's metadata (font, text and render data)
- Bounce page and it's panel's metadata to json in parallel.
- Used renderer to create dataset from the generated json in parallel.
- Each Manga Page Image is represented by a Page object which is a special type of Panel object which has children panels and those have sub-panels in a tree-like fashion.
- Each Page has N panels which is determined by segmenting the page into rectangles as follows:
- First a top level set of panels are created. e.g. divide the page into 2 rectnagles
- Then based on which type of layout is selected one or both of the panels are further subdivided into panels e.g. I want 4 panels on this page. So I can divide two panels into two, one panel into three and leave one as is, etc.
- These "formulas" of layouts for pages are hard coded per number of panels for now.
- In addition to this, the dividion of panels into sub-panels is not equal and the panels are sub-divided randomly across one axis. e.g. 1 panel can have 30% of the area and the other 70%.
- These panels as they are being subdivided are entered as children of a parent panel resulting in a tree originating at the Page as the root.
- Once this is done the panels are then are put through various affine transforms and slicing to result in the iconic "Manga Panel" like layout. Refer example above.
- After the transformations, the panels are then shrunk in size to create panel boundaries which are visible.
- Once shrinking is done, there's a chance of adding a background to the whole page and subsequently removing a panel or two randomly to create a white space or a foreground effect
- Once this is done each panel is then populated with a background image which is selected randomly and a number of speech bubbles are created as follows:
- First a template image for a speech bubble is selected out of the 91 base templates. This template is then put through a series of transformations. (flipping it horizontally/vertically, rotating it slightly, inverting it, stretching it along the x or y axis)
- Along with this the tagged writing area within the bubble is also transformed
- Once this is done a selected font, with a random font size and a selected piece of text are then resized such that they can be rendered onto the bubble either top to bottom or left to right depending on a user-defined probability
- After this the metadata is written into a JSON file
- This creation of one page sequentially and is wrapped in a single function that allows it to be dumped to JSON in parallel
- Once the JSON files are dumped, the folder where they were dumped is scanned, and then each file is loaded again via a load_data method in the Page class which recreates the data. This is then subsequently rendered by each page class's render method. This operation is done concurrently and in parallel for speed.
- JESC dataset
- Tagged anime illustrations Kaggle dataset
- Comic book pages Kaggle dataset
- Fonts allowed for commerical use from Free Japanese Fonts - Licences are on individual pages
- Object Detection for Comics using Manga109 Annotations - Used as benchmark
- Speech bubbles PSD file
- Label studio
JESC dataset
@ARTICLE{pryzant_jesc_2017,
author = {{Pryzant}, R. and {Chung}, Y. and {Jurafsky}, D. and {Britz}, D.},
title = "{JESC: Japanese-English Subtitle Corpus}",
journal = {ArXiv e-prints},
archivePrefix = "arXiv",
eprint = {1710.10639},
keywords = {Computer Science - Computation and Language},
year = 2017,
month = oct,
} ```