You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@lizisepul hard to tell from your comment, but is the original string proper UTF-8? If not, CLD won't be able to handle it, and I'd recommend filtering out those characters before feeding to this package.
One problem is that the pypi package still suffers from this issue: #1
Can the package receive an update? Or can you make a pre-release on Github? If the spaCy pipeline doesn't crash on unicode errors it's easier to handle these cases.
Hi,
I am adding spacy-cld as a component of a spacy pipeline. I am getting the following error with the string 'Con�anza'.
lib/python3.6/site-packages/spacy_cld/spacy_cld.py", line 20, in detect_languages\n _, _, results = detect(text.text)\npycld2.error: input contains invalid UTF-8 around byte 3 (of 8)\n']
The word looks like above when you print it.
print('Con�anza'.encode(encoding='utf-8'))
b'Con\x7fanza'
Thanks
The text was updated successfully, but these errors were encountered: