Skip to content

pdf-translator translates English PDF files into Japanese, preserving the original layout.

License

Notifications You must be signed in to change notification settings

stn/pdf-translator

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Translator EN-JA

Features

This repository offers an API endpoint that translates English PDF files into Japanese, preserving the original layout. If you use translator.py, The translated PDF files are saved in ./outputs directory.

To speed up the translation process, translation is performed until "References" section in the PDF file. After that, the rest of the page is copied as it is.

To be more readable, the translated PDF file displays the original PDF page in the left side and the translated text in the right side (see the image above).

This repository contains some unsolved issues. Pull requests for improvements are always welcome.

Installation

  1. Clone this repository
   git clone https://github.com/discus0434/pdf-translator.git
   cd pdf-translator
  1. Build the docker image
   make build

Usage

   ./pdf-translator.sh path/to/input.pdf

The translated PDF files will be saved in the same directory of the input PDF file with the suffix _ja.

Requirements

  • NVIDIA GPU (currently only support NVIDIA GPU)
  • Docker
  • Python 3+

License

This repository does not allow commercial use.

This repository is licensed under CC BY-NC 4.0. See LICENSE for more information.

References

TODOs

  • Make possible to highlight the translated text
  • Support M1 Mac or CPU
  • Implement Gradio UI

Contributors

Thanks to the following people who have contributed to this project:

  • Akira Ishino: Improvements on text truncation algorithm
  • hibit: Implementation of directory input to translator.py

About

pdf-translator translates English PDF files into Japanese, preserving the original layout.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 89.1%
  • Dockerfile 7.2%
  • Shell 2.5%
  • Makefile 1.2%