A Python Django Service for translation
This is a Python based Django service that provides endpoints for creating translation and retrieving user translations.
The design decisions of this service have been based on the discussion here.
Excalidraw Flow Diagram: https://excalidraw.com/#json=yyBV76AJi0DZ0Gg7oHn5a,JCzDkcdZk1INg8QId7b5Dw
Project requires
- python 3.10.8
- pyenv
- docker
- docker-desktop
- poetry (version 1.6.1)
Install pyenv
tool to manage python version.
# install docker
brew install docker
brew link docker # optional
# install docker-compose
brew install docker-compose
First create a .env file and copy the secrets I have shared with you.
# via docker
docker compose up -d
docker-compose exec service bash
make create-migration
make migrate
make create-superuser
make run
The application is accessible at http://localhost:8000.
You can also access the API definitons at http://localhost:8000/redoc
pytest
will run all unit tests that you specify in your codebase.
As pytest convention, all files matching test_*.py
will be included.
docker-compose exec service bash
poetry run pytest -v
- Create Django tables and super user
docker-compose up -d
make create-migration
make migrate
make create-superuser
make run
- Login to django admin http://localhost:8000/admin
- Create an API Key
- call translate endpoint
curl --location 'localhost:8000/v1/translation/translate' \
--header 'Content-Type: application/json' \
--header 'Authorization: Api-Key {API_KEY}' \
--data '{
"user_id": "119c40ef-2e9f-4992-9433-94ef99daeb19",
"format": "html",
"original_content": "<div><h2 class='\''editor-heading-h2'\'' dir='\''ltr'\''><span>hallo1 as headline</span></h2><p class='\''editor-paragraph'\'' dir='\''ltr'\''><br></p><p class='\''editor-paragraph'\'' dir='\''ltr'\''><span>hallo2 as paragraph</span></p><p class='\''editor-paragraph'\'' dir='\''ltr'\''><span>hallo3 as paragraph with </span><b><strong class='\''editor-text-bold'\''>bold</strong></b><span> inline</span></p></div>"
}
'
- Does not translate time and address tags at present. TODO: Translated time and address tags as well.
- Can not translate text on the buttons
- Instead of treating
<span>hallo3 as paragraph with </span><b><strong class='editor-text-bold'>bold</strong></b><span> inline</span>
as one block for translation, the current implementation splits it into multiple components for translation:
<span>hallo3 as paragraph with </span>
<b><strong class='editor-text-bold'>bold</strong></b>
<span> inline</span>
- The service is very CPU intensive. How do we determine the specifications for our pods based on this information? Also, what will be the scaling strategy for our pods?
- How many threads are being created for each HTML input?
- What is the bottleneck of the service?
- There are limits imposed by the Operating System to the number of threads that can be created. What happens for huge HTML files when the total number of threads needed to be created is more than the OS limit?
- Each thread typically needs 1-2 MB memory space for its stack.
- Can we batch our HTML tags translation instead creating one thread for each text segment?
- Do we need to perform some pre and post processing for the HTML content to maintain the structure of the website?
- Multi-threaded solution is faster than multi-process solution because our use-case is I/O intensive where threads are faster than processes because process creation is more expensive.
- For the provided example input, we will have a total of 11 threads created during the translation.