You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What's the problem?
I have a valid utf-8 byte sequence and is perfectly decodable with bytes.decode('utf-8'). However, when it's used as a column, the datatable fails to decode it.
How to reproduce the bug?
Here is a reproducible example.
text_as_bytes = b'\xe7\x94\xa8\xe8\xb5\xb7\xe6\x9d\xa5\xe8\xbf\x98\xe6\x98\xaf\xe5\xbe\x88\xe4\xb8\x8d\xe7\xa8\xb3\xe5\xae\x9a\xe3\x80\x82\xe5\xbe\x88\xe5\xa4\x9a\xe6\x8c\x89\xe9\x94\xae\xe9\x83\xbd\xe8\xa6\x81\xe7\x82\xb9\xe5\xa5\xbd\xe5\x87\xa0\xe6\xac\xa1\xe6\x89\x8d\xe8\xa1\x8c\xe3\x80\x82'
text_as_str = text_as_bytes.decode('utf-8')
# text_as_str is printed out correctly
print(text_as_str) # 用起来还是很不稳定。很多按键都要点好几次才行。
# create a datatable with text_as_str
dt = datatable.Frame({'text': [text_as_str]})
dt # get error: UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 363-364: invalid continuation byte
Your environment?
Python: 3.7 + JupyterLab 0.35.4
OS: Ubuntu 16.04
The text was updated successfully, but these errors were encountered:
Fixed a UnicodeDecodeError that could be thrown when viewing a Frame with unicode characters in Jupyter notebook. The error only manifested for strings that were longer than 50 bytes in length.
Closes#1825
What's the problem?
I have a valid utf-8 byte sequence and is perfectly decodable with
bytes.decode('utf-8')
. However, when it's used as a column, the datatable fails to decode it.How to reproduce the bug?
Here is a reproducible example.
Python: 3.7 + JupyterLab 0.35.4
OS: Ubuntu 16.04
The text was updated successfully, but these errors were encountered: