-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modules/cjkcodecs/_codecs_iso2022.c - read out of bounds #101180
Comments
To make it easy to reproduce. Build your PR branch using: `./configure --with-address-sanitizer && make`
I turned your report into a PR, It should show up in the CI address sanitizer run there. (confirmed locally) |
Hi everyone!
|
I cannot even build Python with such configure options.
|
You should set |
jisx0213_encoder() is called 3 times, the 3rd time, it's called with length=2 but apparently reading
How can |
gdb commands with local changes to add some logs (with printf):
|
…ecs read out of bounds iso2022_jp_3 and iso2022_jp_2004 codecs read out of bounds when encoding Unicode combining character sequence. This bug ocurs the following error: $ python3 -c "print('\u304b\u309a'.encode('iso2022_jp_2004'))" Traceback (most recent call last): File "<string>", line 1, in <module> UnicodeEncodeError: 'iso2022_jp_2004' codec can't encode character '\u309a' in position 1: illegal multibyte sequence This commit fixes the out-of-bounds read.
iso2022_jp_3 and iso2022_jp_2004 are upward compatible with iso2022_jp. In addition to testing iso2022_jp, we will test the following characters added in iso2022_jp_3 and iso2022_jp_2004. JIS X 0213 Unicode ---------------- --------------------------------------------- Plane 1 \x2E\x23 U+3402 Basic Multilingual Plane Plane 1 \x2E\x22 U+2000B Supplementary Ideographic Plane Plane 1 \x24\x77 U+304B U+309A Combining Character Suqence Plane 2 \x21\x22 U+4E02 Basic Multilingual Plane Plane 2 \x7E\x76 U+2A6B2 Supplementary Ideographic Plane The difference between iso2022_jp_3 and iso2022_jp_2004 is the difference between JIS X 0213:2000 and JIS X 0213:2004. Tests the following a character added from JIS X 0213:2000 to JIS X 0213:2004. JIS X 0213:2004 Unicode ---------------- ------- Plane 1 \x2E\x21 U+4FF1 Escape sequence to designate JIS X 0213 character set to G0: character set ESC sequence ----------------------- --------------------------- JIS X 0213:2000 Plane 1 ESC 2/4 2/8 4/15 ESC $ ( O JIS X 0213:2000 Plane 2 ESC 2/4 2/8 5/0 ESC $ ( P JIS X 0213:2004 Plane 1 ESC 2/4 2/8 5/1 ESC $ ( Q JIS X 0213:2004 Plane 2 ESC 2/4 2/8 5/0 ESC $ ( P
I reproduced it using the following steps:
|
…ecs read out of bounds (pythongh-111695) (cherry picked from commit c8faa35) Co-authored-by: Masayuki Moriyama <masayuki.moriyama@miraclelinux.com>
…004 codecs read out of bounds (pythongh-111695) (cherry picked from commit c8faa35) Co-authored-by: Masayuki Moriyama <masayuki.moriyama@miraclelinux.com>
…004 codecs read out of bounds (pythongh-111695) (cherry picked from commit c8faa35) Co-authored-by: Masayuki Moriyama <masayuki.moriyama@miraclelinux.com>
…04 codecs read out of bounds (pythongh-111695) (cherry picked from commit c8faa35) Co-authored-by: Masayuki Moriyama <masayuki.moriyama@miraclelinux.com>
…04 codecs read out of bounds (pythongh-111695) (cherry picked from commit c8faa35) Co-authored-by: Masayuki Moriyama <masayuki.moriyama@miraclelinux.com>
…ecs read out of bounds (pythongh-111695)
…ecs read out of bounds (pythongh-111695)
…ecs read out of bounds (pythongh-111695)
Bug report
Your environment
Steps to reproduce
Prerequisites
crashfile.txt
test.py
Linked PRs
The text was updated successfully, but these errors were encountered: