PDF Translator EN-JA

Features

This repository offers an API endpoint that translates English PDF files into Japanese, preserving the original layout. If you use translator.py, The translated PDF files are saved in ./outputs directory.

To speed up the translation process, translation is performed until "References" section in the PDF file. After that, the rest of the page is copied as it is.

To be more readable, the translated PDF file displays the original PDF page in the left side and the translated text in the right side (see the image above).

This repository contains some unsolved issues. Pull requests for improvements are always welcome.

Installation

Clone this repository

   git clone https://github.com/discus0434/pdf-translator.git
   cd pdf-translator

Build the docker image

   make build

Usage

   ./pdf-translator.sh path/to/input.pdf

The translated PDF files will be saved in the same directory of the input PDF file with the suffix _ja.

Requirements

NVIDIA GPU (currently only support NVIDIA GPU)
Docker
Python 3+

License

This repository does not allow commercial use.

This repository is licensed under CC BY-NC 4.0. See LICENSE for more information.

References

For PDF to text conversion, using PaddlePaddle model.
For text translation, using FuguMT model from HuggingFace.
The docker image is based on paddlepaddle/paddle.
Font files are from Source Han Serif.

TODOs

Make possible to highlight the translated text
Support M1 Mac or CPU
Implement Gradio UI

Contributors

Thanks to the following people who have contributed to this project:

Akira Ishino: Improvements on text truncation algorithm
hibit: Implementation of directory input to translator.py

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
assets		assets
pdf-translator		pdf-translator
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
pdf-translator.sh		pdf-translator.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Translator EN-JA

Features

Installation

Usage

Requirements

License

References

TODOs

Contributors

About

Releases

Packages

Languages

License

stn/pdf-translator

Folders and files

Latest commit

History

Repository files navigation

PDF Translator EN-JA

Features

Installation

Usage

Requirements

License

References

TODOs

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages