UnicodeDecodeError in gpgsig #915

ishepard · 2019-09-02T11:00:36Z

Hi @Byron !
So, someone find this bug using my tool, and it's propagated to GitPython. The problem is in the decoding of a non-ascii character.

repo: https://github.com/gentoo/gentoo
commit: 13e644bb36a0b1f3ef0c2091ab648978d18f369d

code:

from git import Repo, Commit

gr = Repo('/tmp/gentoo')
c = gr.commit('13e644bb36a0b1f3ef0c2091ab648978d18f369d')

print(c.authored_date)

This returns:

Traceback (most recent call last):
  File "/Users/dspadini/Documents/pydriller/tmp.py", line 341, in <module>
    print(c.authored_date)
  File "/Users/dspadini/Documents/pydriller/venv/lib/python3.7/site-packages/gitdb/util.py", line 253, in __getattr__
    self._set_cache_(attr)
  File "/Users/dspadini/Documents/pydriller/venv/lib/python3.7/site-packages/git/objects/commit.py", line 144, in _set_cache_
    self._deserialize(BytesIO(stream.read()))
  File "/Users/dspadini/Documents/pydriller/venv/lib/python3.7/site-packages/git/objects/commit.py", line 502, in _deserialize
    self.gpgsig = sig.rstrip(b"\n").decode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 75: ordinal not in range(128)

The problem is in this line. The line to be decoded is the following:

b'-----BEGIN PGP SIGNATURE-----\nVersion: GnuPG v2.1\nComment: Signed-off-by: J\xc3\xb6rg Bornkessel <hd_brummy@gentoo.org>\n\n.........\n-----END PGP SIGNATURE-----\n'

As you can see, at the beginning we have J\xc3\xb6rg. This fails the decoding.

So, I tried to change .decode('ascii') to .decode('UTF-8') and it works.
Also, changing .decode('ascii') to .decode('ascii', 'ignore') works.

However, I am not sure whether I should do it. Why is ascii in the first place (instead of UTF-8)?
Are we gonna break tests with this change?

The text was updated successfully, but these errors were encountered:

Byron · 2019-09-03T07:42:38Z

Thanks a lot for posting! In GitPython, all handling of string encodings is somewhat botched, as it was written in a time when things were silently assumed to be ascii only.

I would be happy about a PR, it looks like changing this to .decode('UTF-8', 'ignore') would do the job in most cases, while never being able to fail, and without breaking backwards compatibility.

If there was a rewrite, one would have to stop assuming any encoding, and work with bytes instead, to leave the decoding to the consumer.

Byron added the acknowledged label Sep 3, 2019

Byron added the help wanted label Sep 3, 2019

vin01 mentioned this issue Sep 4, 2019

Broken fileserver.file_list, dir_list for gitpython using python3.x saltstack/salt#54402

Closed

ishepard mentioned this issue Sep 16, 2019

Fix UnicodeDecodeError in gpgsig #922

Merged

Byron closed this as completed in #922 Sep 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError in gpgsig #915

UnicodeDecodeError in gpgsig #915

ishepard commented Sep 2, 2019 •

edited

Loading

Byron commented Sep 3, 2019

UnicodeDecodeError in gpgsig #915

UnicodeDecodeError in gpgsig #915

Comments

ishepard commented Sep 2, 2019 • edited Loading

Byron commented Sep 3, 2019

ishepard commented Sep 2, 2019 •

edited

Loading