-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file encoding (and some read modes) at open file step #2219
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2219 +/- ##
=======================================
Coverage 93.27% 93.27%
=======================================
Files 238 238
Lines 12818 12819 +1
=======================================
+ Hits 11956 11957 +1
Misses 862 862
|
what's Codecov's deal? It's very confused |
Great to see some action on this topic! |
@bouweandela cheers! yes, but:
This leaves a couple open file calls which I'll look into now 👍 |
We could check if the problem is solved by doing |
you do that, I am not sure if I can revert the locale vars on my ancient OS 🤣 🤣 JK, can you try pls - while I am looking at the other calls that need encoding |
eugh I can't change my
very possible since OS is a little old 😁 |
Yes, but the problem occurred if the text encoding was not utf-8.
It would be safer to always write text files in utf-8 encoding, so they work regardless of what computer the user is opening them on.
JSON files are text files, so it would be best to specify the encoding there too. Image files should be opened in binary mode Pylint still gives me a list of warnings (not all relevant, but most seem relevant):
I tried testing this on my own computer, but it's a bit of a hassle because I don't have any other character set than utf-8 installed. |
.github/workflows/run-tests.yml
Outdated
@@ -59,7 +60,10 @@ jobs: | |||
- run: python -V 2>&1 | tee test_linux_artifacts_python_${{ matrix.python-version }}/python_version.txt | |||
- run: pip install -e .[develop] 2>&1 | tee test_linux_artifacts_python_${{ matrix.python-version }}/install.txt | |||
- run: flake8 | |||
- run: pytest -n 2 -m "not installation" 2>&1 | tee test_linux_artifacts_python_${{ matrix.python-version }}/test_report.txt | |||
- run: | | |||
sudo update-locale LC_ALL=C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this update-locale
thing? You shouldn't need to try and change the default locale for the whole machine. Just do an export LC_ALL=C
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meh. Gonna add encoding to all open file calls anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing we want to test is that it works fine with a default character set other than utf-8, as that is what we keep running into (see linked issues).
Aye, am gonna add encoding to everything then - just to be on the safe side |
Hm. I know this is something that @bouweandela also brought up, but I feel a bit more cautious about it. For Yaml, the case is clear cut; the Yaml specs dictate UTF(-8). For all other text files, it is perfectly reasonable to use different encodings, isn't it? Also, a side question: Why are we ever opening Yaml files in binary mode? |
@zklaus indeed that was me concern too in #2219 (comment) - but me not being in the know of file encodings I assumed @bouweandela knows a tad more than me, so I was thinking of following his advice - how about I do that and test with export LC_ALL=C on GAs? |
OK @bouweandela this should now cover all instances of open file, except shapefiles, bytes, and images - I really don't want to faff around those (wb and rb strictly don't accept encoding anyway) 🍺 |
The problem is that Python uses the default encoding setting of the machine that it is running on instead of the encoding of the file, as there is no way to specify that, and therefore this causes problems when the system encoding is set to something ancient. To avoid this issue, we need to specify the encoding in all text files we are reading and writing so they work across platforms and users.
It looks like |
Cheers for chipping in @bouweandela 🍺 Thing is, don't know how to change my UNIX default encoding without having to use a Klingon translator to manage commands in the terminal 😁 |
@valeriupredoi I think you could do this (haven't tried it myself): https://askubuntu.com/a/120068. So basically you 1) make sure that the locale has been generated with the requested character set and then 2) export the |
thanks @bouweandela but you assume I can still run |
many thanks for approving and merging, gents - let's wait see if some people with bizarre encodings complain, then we can get back to it, by then I should be running a modern OS too, so I can test 😁 |
I just checked and the command is available in Ubuntu 14.04. I believe that is your favorite OS, isn't it? 🤣 |
no, I "evolved" - I am now a big fan of 16.04 🤣 |
Description
As discussed in #1585 we should specify the type of encoding (bogstandard UTF-8 here) when opening file objects; binary open does NOT require that (in fact enforces it to be binary only), so I added encodings to all the YAML files we open, and some read mode 'r' when needed.
Closes #1585
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
To help with the number pull requests: