-
-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decode popen output using the system locale if UTF-8 decoding fails. #380
Conversation
fixes #309 by trying to decode popen output with utf8 first and on error tries other encodings provided by the systems preferences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this maybe be simplified to try UTF-8 first, and fall back to locale.getpreferredencoding
in case of a UnicodeDecodeError
?
I would skip sys.getdefaultencoding
because it appears to be hardcoded to UTF-8. See CPython source (1, 2) and Victor Stinner's notes.
sys.stdout.encoding does not seem relevant here, as we are decoding the output of another process. IIUC, on a legacy Windows system the process output will typically be CP-1252, while our own stdout might be a console device, which is always UTF-8. On non-Windows systems, it's just the locale encoding.
(The code assumes that a wrong encoding will result in a UnicodeDecodeError, and not simply in garbled output. As far as I'm aware, this is rare when decoding to UTF-8, at least from ASCII-based encodings [citation needed 😉], so I guess this is fine.)
@Cielquan you could try installing nox from this branch to test if this solves your problem (if you haven't already!) |
Thanks for the fast replies you two! @cjolowicz >>> "ü".encode("utf8").decode("latin-1")
'ü' But I think this is better than an @stsewd |
Sooo ... I made adjustments proposed by @cjolowicz and added some tests. I use the |
Some remarks on the CI test:
...
nox > Running session tests-3.9
nox > Session tests-3.9 skipped: Python interpreter 3.9 not found.
Build success
...
Name Stmts Miss Branch BrPart Cover Missing
-----------------------------------------------------------------
nox/__init__.py 6 0 0 0 100%
nox/__main__.py 17 0 4 0 100%
nox/_decorators.py 38 0 2 0 100%
nox/_option_set.py 99 0 34 0 100%
nox/_options.py 57 0 24 0 100%
nox/_parametrize.py 74 0 36 0 100%
nox/command.py 57 0 26 0 100%
nox/logger.py 46 0 12 0 100%
nox/manifest.py 131 0 72 0 100%
nox/popen.py 33 0 6 0 100%
nox/registry.py 20 0 6 0 100%
nox/sessions.py 245 0 76 0 100%
nox/tasks.py 91 0 32 0 100%
nox/tox_to_nox.py 18 0 2 0 100%
nox/virtualenv.py 177 22 54 0 87% 194-206, 219-248
nox/workflow.py 15 0 6 0 100%
-----------------------------------------------------------------
TOTAL 1124 22 392 0 98%
EDIT1: spelling |
The test suite skips conda tests if you don't have conda installed, and those missing lines are from CondaEnv. Could that be what caused it? |
Well yeah, I totally forgot about that. Thanks. |
Updated the branch |
Updated the branch. Anything else needed from me? @theacodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Cielquan thanks for moving this forward!
Today I found some time to test this on a legacy system running Windows Server
2008 (64-bit). I could reproduce the original issue, and I could also confirm
that your fix works 🚀 See details below.
I have one suggestion, though. As you can see in the last command output below,
when both encodings fail, Python will chain the tracebacks for both
UnicodeDecodeError exceptions, including these two error messages:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x95 in position 0: invalid start byte
UnicodeDecodeError: '[utf-8, BIG5]' codec can't decode byte 0x95 in position 0: incomplete multibyte sequence
You can see that modifying the second exception during flight is both
unnecessary and slightly confusing. I would therefore suggest that you eliminate
the second try...catch
block, keeping only the body of the try
clause.
I'll leave a code suggestion to clarify what I mean.
Stylistically, I would also suggest to eliminate the decoded_output
variable,
and simply return
instead of assigning to it. But this is just a nitpick, feel
free to ignore.
Here are the details, FTR:
On the Windows system I used, locale.getpreferredencoding
returns CP-1252.
sys.stdout.encoding
returns UTF-8 when writing to the command prompt, and
CP-1252 when redirecting output to a file.
Like you, I used the Unicode BULLET
codepoint for testing, which is 0x95
when encoded in CP-1252.
>>> import unicodedata
>>> unicodedata.name(bytes([0x95]).decode("cp1252"))
'BULLET'
This byte sequence causes an error when decoded to UTF-8:
>>> bytes([0x95]).decode("utf8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x95 in position 0: invalid start byte
I used the following noxfile.py for testing:
import nox
@nox.session
def tests(session):
session.run("python", "-c", 'import sys; print(f"{sys.stdout.encoding = }")')
session.run("python", "-c", 'import sys; sys.stdout.buffer.write(bytes([0x95]))', silent=True)
We pass silent
to get Nox to capture stdout
. This is done by default when you get here via session.install
.
Below is the output when running nox
from the latest release, and from your branch. Expand the output using the arrows.
Reproducing the error:
(venv) C:\Users\cjolowicz>nox
nox > Running session tests
nox > Creating virtual environment (virtualenv) using python.exe in .nox\tests
nox > python -c import sys; print(f"{sys.stdout.encoding = }")
sys.stdout.encoding = 'utf-8'
nox > python -c import sys; sys.stdout.buffer.write(bytes([0x95]))
nox > Session tests raised exception UnicodeDecodeError('utf-8', b'\x95', 0, 1, 'invalid start byte')
Traceback (most recent call last):
File "c:\users\cjolowicz\venv\lib\site-packages\nox\sessions.py", line 549, in execute
self.func(session)
File "c:\users\cjolowicz\venv\lib\site-packages\nox\_decorators.py", line 53, in __call__
return self.func(*args, **kwargs)
File "C:\Users\cjolowicz\noxfile.py", line 6, in tests
session.run("python", "-c", 'import sys; sys.stdout.buffer.write(bytes([0x95]))', silent=True)
File "c:\users\cjolowicz\venv\lib\site-packages\nox\sessions.py", line 262, in run
return self._run(*args, env=env, **kwargs)
File "c:\users\cjolowicz\venv\lib\site-packages\nox\sessions.py", line 318, in _run
return nox.command.run(args, env=env, paths=self.bin_paths, **kwargs)
File "c:\users\cjolowicz\venv\lib\site-packages\nox\command.py", line 111, in run
return_code, output = popen(
File "c:\users\cjolowicz\venv\lib\site-packages\nox\popen.py", line 48, in popen
return return_code, out.decode("utf-8") if out else ""
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x95 in position 0: invalid start byte
nox > Session tests failed.
Reproducing the error with output redirected:
(venv) C:\Users\cjolowicz>nox > nox.log 2>&1
(venv) C:\Users\cjolowicz>type nox.log
nox > Running session tests
nox > Creating virtual environment (virtualenv) using python.exe in .nox\tests
nox > python -c import sys; print(f"{sys.stdout.encoding = }")
sys.stdout.encoding = 'cp1252'
nox > python -c import sys; sys.stdout.buffer.write(bytes([0x95]))
nox > Session tests raised exception UnicodeDecodeError('utf-8', b'\x95', 0, 1, 'invalid start byte')
Traceback (most recent call last):
File "c:\users\cjolowicz\venv\lib\site-packages\nox\sessions.py", line 549, in execute
self.func(session)
File "c:\users\cjolowicz\venv\lib\site-packages\nox\_decorators.py", line 53, in __call__
return self.func(*args, **kwargs)
File "C:\Users\cjolowicz\noxfile.py", line 6, in tests
session.run("python", "-c", 'import sys; sys.stdout.buffer.write(bytes([0x95]))', silent=True)
File "c:\users\cjolowicz\venv\lib\site-packages\nox\sessions.py", line 262, in run
return self._run(*args, env=env, **kwargs)
File "c:\users\cjolowicz\venv\lib\site-packages\nox\sessions.py", line 318, in _run
return nox.command.run(args, env=env, paths=self.bin_paths, **kwargs)
File "c:\users\cjolowicz\venv\lib\site-packages\nox\command.py", line 111, in run
return_code, output = popen(
File "c:\users\cjolowicz\venv\lib\site-packages\nox\popen.py", line 48, in popen
return return_code, out.decode("utf-8") if out else ""
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x95 in position 0: invalid start byte
nox > Session tests failed.
Confirming that the fix works (happy case):
(venv-fix) C:\Users\build\cjolowicz>nox
nox > Running session tests
nox > Creating virtual environment (virtualenv) using python.exe in .nox\tests
nox > python -c import sys; print(f"{sys.stdout.encoding = }")
sys.stdout.encoding = 'utf-8'
nox > python -c import sys; sys.stdout.buffer.write(bytes([0x95]))
nox > Session tests was successful.
(venv-fix) C:\Users\build\cjolowicz>nox > nox.log 2>&1
(venv-fix) C:\Users\build\cjolowicz>type nox.log
nox > Running session tests
nox > Creating virtual environment (virtualenv) using python.exe in .nox\tests
nox > python -c import sys; print(f"{sys.stdout.encoding = }")
sys.stdout.encoding = 'cp1252'
nox > python -c import sys; sys.stdout.buffer.write(bytes([0x95]))
nox > Session tests was successful.
Confirming that the fix works (error case: output cannot be decoded from the preferred encoding either):
❯ env LC_ALL=zh_TW.big5 nox
nox > Running session tests
nox > Creating virtual environment (virtualenv) using python in .nox/tests
nox > python -c import sys; print(f"{sys.stdout.encoding = }")
sys.stdout.encoding = 'big5'
nox > python -c import sys; sys.stdout.buffer.write(bytes([0x95]))
nox > Session tests raised exception UnicodeDecodeError('big5', b'\x95', 0, 1, 'incomplete multibyte sequence')
Traceback (most recent call last):
File "/tmp/nox-unicode/venv/lib/python3.9/site-packages/nox/popen.py", line 31, in decode_output
decoded_output = output.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x95 in position 0: invalid start byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/nox-unicode/venv/lib/python3.9/site-packages/nox/sessions.py", line 551, in execute
self.func(session)
File "/tmp/nox-unicode/venv/lib/python3.9/site-packages/nox/_decorators.py", line 53, in __call__
return self.func(*args, **kwargs)
File "/tmp/nox-unicode/noxfile.py", line 6, in tests
session.run("python", "-c", 'import sys; sys.stdout.buffer.write(bytes([0x95]))', silent=True)
File "/tmp/nox-unicode/venv/lib/python3.9/site-packages/nox/sessions.py", line 264, in run
return self._run(*args, env=env, **kwargs)
File "/tmp/nox-unicode/venv/lib/python3.9/site-packages/nox/sessions.py", line 320, in _run
return nox.command.run(args, env=env, paths=self.bin_paths, **kwargs)
File "/tmp/nox-unicode/venv/lib/python3.9/site-packages/nox/command.py", line 111, in run
return_code, output = popen(
File "/tmp/nox-unicode/venv/lib/python3.9/site-packages/nox/popen.py", line 74, in popen
return return_code, decode_output(out) if out else ""
File "/tmp/nox-unicode/venv/lib/python3.9/site-packages/nox/popen.py", line 38, in decode_output
decoded_output = output.decode(second_encoding)
UnicodeDecodeError: '[utf-8, BIG5]' codec can't decode byte 0x95 in position 0: incomplete multibyte sequence
nox > Session tests failed.
@cjolowicz Thanks for your thorough answer. I also learned some new things 😀 I agree in all your points and will gladly take your suggestions and fix the tests accordingly later (hopefully today elsewise tomorrow). I think the |
Co-authored-by: Claudio Jolowicz <cjolowicz@gmail.com>
Ok fixed the tests. Was only one though. With this the PR should be ready for merging? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Looks great! Thank you both! |
Just a heads up @Cielquan and @cjolowicz: This pull request is eligible for payment via our Open Collective. You can each file an expense for $20 USD for this contribution. Thanks! |
thank you for setting this up @theacodes ❤️ |
Thanks for merging @theacodes and thanks again for reviewing @cjolowicz. |
Based on my comments in the issue #309 I created a decoding wrapper function. As
utf8
is the python default I kept it hard coded as first choice of the encodings to try. Ifutf8
failssys.stdout.encoding
,sys.getdefaultencoding()
andlocale.getpreferredencoding()
(in this order) deliver possible other encodings to try. If all decoding attempts fail theUnicodeDecodeError
exception is risen.I'm open for other orders of the encoding sources.
If this fix is well received, I will add the missing tests and fix the broken ones if there are any.
fixes #309