Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong index #2

Open
sraaphorst opened this issue Mar 1, 2023 · 1 comment
Open

Wrong index #2

sraaphorst opened this issue Mar 1, 2023 · 1 comment

Comments

@sraaphorst
Copy link

i - 1))

This (and also in ex 4.4) should have the -1 in the call to utf8.offset on the outside and not on the inside. Try this output and you will see:

-- Japanese example.
jp = '私は日本語が分かります。'
print(jp)
print(insert(jp, 3, "少し"))

-- Chinese example.
zh = "我學了四年了,可是很長的時間沒有練習。"
print(zh)
print(insert(zh, 4, "中文"))

-- Japanese example.
jp = '私は少し日本語が分かります。'
print(jp)
print(remove(jp, 3, 2))

-- Chinese example.
zh = "我學了中文四年了,可是很長的時間沒有練習。"
print(zh)
print(remove(zh, 4, 2))

@rswinkle
Copy link

Good catch. For others who find this thread later: sub deals in bytes so to include all of the bytes of the potentially multi-byte i'th utf8 character, we have to get the byte position of the (i+1)th and subtract 1.

jxlin123 added a commit to jxlin123/pil-4th that referenced this issue Dec 9, 2024
As discussed on <oitofelix#2>, the
original solutions for exercises 4.4 and 4.6 may accidentally "cut off"
bytes, given the nature of Unicode codepoints possibly being encoded
using multiple bytes. I've now gone ahead and applied the fix as
described in that link.
@jxlin123 jxlin123 mentioned this issue Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants