Support for Arabic/Persian text #478

carlhiggs · 2024-09-03T01:03:53Z

We have been attempting to support Persian and Arabic text, however feedback received suggests there is more work to do.

Our colleague Mohammad Sadegh Anisi contributed preliminary translations and provided advice on draft reports preparing these in Persian/Farsi, reporting several issues via e-mail that I will record below:

Picture 1: In formal writing, we adhere to the use of half-space. I implemented this in the report configuration file, but as highlighted in the picture, it caused repeated errors throughout the document.

Picture 2: It would be better to relocate the objects in RTL format to make them more understandable for RTL users.

Picture 3: The word "low" is misplaced and could mislead readers.

Picture 4: As marked, some words are written backward, and this issue also continues on other pages.

carlhiggs · 2024-09-03T02:11:18Z

preliminary thoughts I shared by e-mail with Mohammad:

Regarding representation of half-spaces (problem/picture 1)
While it isn't clear to me what the error is from the picture, I wonder if this relates to the issue described in this StackOverflow post about representing the unicode \u200c character.

I see \u200c (non zero width joiner) is the character you have used in your translation, and I suspect its use may not be supported by the software we use for preparing pdfs, PyFPDF (fpdf2). I searched issues and the code and couldn't see a reference to that unicode character or its descriptions.

So, my thoughts on an approach to address this are:

Address on our side
- would it be valid if I replaced \u200c with a space character, or something similar?
- Something simple like this would be the quickest approach
Address in the upstream fpdf2 software
- that might be ideal, but beyond my capacity as I don't fully understand the problem being unable to read Persian
- If you were interested, you could lodge an issue on the PyFPDF/fpdf2 github site, and also if you were interested, contribute code to address it. Here is an example of an issue I lodged for word wrapping in Thai.

Regarding RTL legend graphics (problem/picture 2)
This should be do-able, introducing some conditional template formatting for Right-to-Left languages.

The other problems that are more clearly errors in representation may be a higher priority, but I will look to find time to make this change in the coming fortnight.

Regarding the mis-placed 'low' on legend (problem/picture 3)
I believe this is a similar issues to picture 2, but with a simpler fix. For RTL languages, we need to change the format or perhaps alignment. I suspect what is happening is, its for whatever right-aligning to the given text box for that word instead of left-aligning. In any case, i think we can fix this. Perhaps similar conditional logic can be used to address both problems 2 and 3.

Regarding written backwards text (problem/picture 4)
The text on most figure legends like this one (but not in pictures 2 and 3) is produced not using FPDF2, but rather using MatPlotLib, the Python plotting function. I suspect this could be a problem for RTL languages and may be challenging for me to address. I found some example code for correct representation of Persian/Arabic in python plots here. I suspect we may need to do similar. Another approach would be, for RTL scripts (or perhaps all languages) produce legends like we do in your pictures 2 and 3 using FPDF2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Arabic/Persian text #478

Support for Arabic/Persian text #478

carlhiggs commented Sep 3, 2024

carlhiggs commented Sep 3, 2024

Support for Arabic/Persian text #478

Support for Arabic/Persian text #478

Comments

carlhiggs commented Sep 3, 2024

carlhiggs commented Sep 3, 2024