You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from git import Repo, Commit
gr = Repo('/tmp/gentoo')
c = gr.commit('13e644bb36a0b1f3ef0c2091ab648978d18f369d')
print(c.authored_date)
This returns:
Traceback (most recent call last):
File "/Users/dspadini/Documents/pydriller/tmp.py", line 341, in <module>
print(c.authored_date)
File "/Users/dspadini/Documents/pydriller/venv/lib/python3.7/site-packages/gitdb/util.py", line 253, in __getattr__
self._set_cache_(attr)
File "/Users/dspadini/Documents/pydriller/venv/lib/python3.7/site-packages/git/objects/commit.py", line 144, in _set_cache_
self._deserialize(BytesIO(stream.read()))
File "/Users/dspadini/Documents/pydriller/venv/lib/python3.7/site-packages/git/objects/commit.py", line 502, in _deserialize
self.gpgsig = sig.rstrip(b"\n").decode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 75: ordinal not in range(128)
The problem is in this line. The line to be decoded is the following:
Thanks a lot for posting! In GitPython, all handling of string encodings is somewhat botched, as it was written in a time when things were silently assumed to be ascii only.
I would be happy about a PR, it looks like changing this to .decode('UTF-8', 'ignore') would do the job in most cases, while never being able to fail, and without breaking backwards compatibility.
If there was a rewrite, one would have to stop assuming any encoding, and work with bytes instead, to leave the decoding to the consumer.
Hi @Byron !
So, someone find this bug using my tool, and it's propagated to GitPython. The problem is in the decoding of a non-ascii character.
repo: https://github.com/gentoo/gentoo
commit:
13e644bb36a0b1f3ef0c2091ab648978d18f369d
code:
This returns:
The problem is in this line. The line to be decoded is the following:
As you can see, at the beginning we have
J\xc3\xb6rg
. This fails the decoding.So, I tried to change
.decode('ascii')
to.decode('UTF-8')
and it works.Also, changing
.decode('ascii')
to.decode('ascii', 'ignore')
works.However, I am not sure whether I should do it. Why is
ascii
in the first place (instead of UTF-8)?Are we gonna break tests with this change?
The text was updated successfully, but these errors were encountered: