Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError in gpgsig #915

Closed
ishepard opened this issue Sep 2, 2019 · 1 comment · Fixed by #922
Closed

UnicodeDecodeError in gpgsig #915

ishepard opened this issue Sep 2, 2019 · 1 comment · Fixed by #922

Comments

@ishepard
Copy link
Contributor

ishepard commented Sep 2, 2019

Hi @Byron !
So, someone find this bug using my tool, and it's propagated to GitPython. The problem is in the decoding of a non-ascii character.

repo: https://github.com/gentoo/gentoo
commit: 13e644bb36a0b1f3ef0c2091ab648978d18f369d

code:

from git import Repo, Commit

gr = Repo('/tmp/gentoo')
c = gr.commit('13e644bb36a0b1f3ef0c2091ab648978d18f369d')

print(c.authored_date)

This returns:

Traceback (most recent call last):
  File "/Users/dspadini/Documents/pydriller/tmp.py", line 341, in <module>
    print(c.authored_date)
  File "/Users/dspadini/Documents/pydriller/venv/lib/python3.7/site-packages/gitdb/util.py", line 253, in __getattr__
    self._set_cache_(attr)
  File "/Users/dspadini/Documents/pydriller/venv/lib/python3.7/site-packages/git/objects/commit.py", line 144, in _set_cache_
    self._deserialize(BytesIO(stream.read()))
  File "/Users/dspadini/Documents/pydriller/venv/lib/python3.7/site-packages/git/objects/commit.py", line 502, in _deserialize
    self.gpgsig = sig.rstrip(b"\n").decode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 75: ordinal not in range(128)

The problem is in this line. The line to be decoded is the following:

b'-----BEGIN PGP SIGNATURE-----\nVersion: GnuPG v2.1\nComment: Signed-off-by: J\xc3\xb6rg Bornkessel <hd_brummy@gentoo.org>\n\n.........\n-----END PGP SIGNATURE-----\n'

As you can see, at the beginning we have J\xc3\xb6rg. This fails the decoding.

So, I tried to change .decode('ascii') to .decode('UTF-8') and it works.
Also, changing .decode('ascii') to .decode('ascii', 'ignore') works.

However, I am not sure whether I should do it. Why is ascii in the first place (instead of UTF-8)?
Are we gonna break tests with this change?

@Byron
Copy link
Member

Byron commented Sep 3, 2019

Thanks a lot for posting! In GitPython, all handling of string encodings is somewhat botched, as it was written in a time when things were silently assumed to be ascii only.

I would be happy about a PR, it looks like changing this to .decode('UTF-8', 'ignore') would do the job in most cases, while never being able to fail, and without breaking backwards compatibility.

If there was a rewrite, one would have to stop assuming any encoding, and work with bytes instead, to leave the decoding to the consumer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

2 participants