Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to Automate Language Translations #3607

Open
vishvamsinh28 opened this issue Jan 25, 2025 · 7 comments
Open

Script to Automate Language Translations #3607

vishvamsinh28 opened this issue Jan 25, 2025 · 7 comments

Comments

@vishvamsinh28
Copy link
Contributor

We need a script that can automate the process of translating content from English into multiple languages. The script should take the json data, translate its content using an API, and create folders for each target language. Within these folders, it should save the translated files with the same structure as the original. This will help streamline the translation process, maintain consistency, and simplify managing multi-language content.

@AkshatJain481
Copy link

I would like to work on this issue to automate the translation process. Here's my proposed plan:

Input Handling: The script will accept a JSON file containing the English content as input.

Translation: I'll use a translation API (e.g., Google Translate or DeepL) to translate the content into multiple target languages.

Please let me know if there are any specific APIs, languages, or features you'd prefer for this solution.

@AkshatJain481
Copy link

Which API should we use for the translation process? Please confirm so I can proceed with the implementation.

@vishvamsinh28
Copy link
Contributor Author

vishvamsinh28 commented Jan 25, 2025

Don't start working on this issue yet. We are still discussing it.

@AkshatJain481
Copy link

Ok, let me know when to proceed

@akshatnema
Copy link
Member

Hey @AkshatJain481, this is a suggested issue for GSoC. Specify your approach first, on how you would like to solve this issue, instead of directly jumping on the implementation.

@AkshatJain481
Copy link

Proposed Approach for Automating JSON Translation

Objective

The script will automate the process of translating JSON-based content from English into multiple target languages using an external translation API. It will maintain the original file structure and ensure smooth management of multilingual content.


Steps to Implement the Solution

1. Read Source JSON Files

  • Identify the source directory (/public/locales/en) that contains English JSON files.
  • Read each .json file and parse its content.

2. Translation Logic

  • Use a translation API (e.g., Google Cloud Translation API, DeepL API, Microsoft Translator API) (not decided yet) to translate text values.
  • Implement recursive translation for nested JSON structures.
  • Ensure proper error handling to manage API failures or rate limits.

3. Create Target Language Directories

  • Generate directories for each target language inside /public/locales/.
  • Maintain the same file structure as the original English files.

4. Write Translated Content to Files

  • Save the translated JSON content into the corresponding language directories.

5. Performance Optimizations

  • Use batch translation where possible to reduce API calls.
  • Implement caching to avoid re-translating unchanged content.
  • Add concurrency control to handle multiple files efficiently while preventing API rate limits.

6. Error Handling & Logging

  • Log any failed translations and provide a fallback mechanism.
  • Skip already translated files unless forced to regenerate them.

Technology Stack

  • Node.js – For file handling and script execution.
  • Translation API (e.g., Google Cloud Translation API, DeepL API, Microsoft Translator API) (not decided yet) – For language translation.
  • fs/promises – For async file operations.

Additional Considerations

Idempotency – Avoid unnecessary API calls if translations already exist.
Extensibility – Easily add support for new languages in the future.
Configurable – Allow specifying languages dynamically instead of hardcoding.


Next Steps

Once the approach is approved, I will proceed with implementation, ensuring best practices for efficiency and maintainability.

Would love to hear your feedback on this approach! 🚀

@AST0008
Copy link

AST0008 commented Feb 2, 2025

Objective

This script automates the process of translating JSON-based content from English into multiple target languages. It maintains the original file structure and optimizes translation efficiency using batch processing, caching, and concurrency.


**Improvements **

Batch Translation – Reduces API calls, improving speed and lowering costs.
Parallel Processing (Concurrency) – Uses async operations for better performance.
Caching Mechanism – Avoids redundant API calls for previously translated content.
Idempotency – Ensures only new/updated content is translated.


** Additional Tech Stack**

  • **Google Cloud Translation API ** – For automated translations.
  • Redis or JSON-based cache – To store translated content and prevent redundant API calls.
  • Worker Threads or Async Processing – Handles multiple translations in parallel.

1️⃣ Read Source JSON Files Efficiently

  • Read files from /public/locales/en.
  • Use asynchronous file operations (fs.promises) for parallel reading.
  • Detect changes before translating using file hashing (to avoid redundant API calls).

2️⃣ Use Efficient Batch Translation

  • Group text into batches and send them in a single API request.
  • Reduce latency and costs by leveraging batch processing.

3️⃣ Implement Caching for Translations

  • Store translations in a local JSON file or Redis.
  • If a translation already exists, reuse it instead of making a new API request.

4️⃣ Maintain Folder & File Structure

  • Generate a target directory for each language (/public/locales/<lang>).
  • Maintain the same structure as the English files.

5️⃣ Concurrency for Faster Processing

  • Use Promise.all() or Worker Threads to translate multiple files in parallel.
  • Limit requests per second to respect API rate limits.

6️⃣ Error Handling & Logging

  • Log failed translations with error messages.
  • Implement retry logic for API failures.
  • Skip already translated files unless forced.

I Think this approach would be better, instead of single line by line, we can send the text as a batch.
lmk if there are any changes required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants