-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
terminate called after throwing an instance of 'Exception' what(): Error: Malformed input stream. #92
Comments
It's because you have an unescaped |
How would you escape it?
throws the same exception |
the escape has to get to the underlying pipe |
This still didn't work
Also, is there specific list for characters that need to be escaped? |
Try adding another backslash ? :) |
seems that backslashes are only interpreted as backslashes here... |
Probably what this indicates is that there should be a way to have |
Also, I think this should actually be on https://github.com/apertium/apertium-python but I for some reason am not able to transfer it there |
Dear colleagues, thank you for your work. How do i fix this? Some workaround maybe. Minimal example: ESC_PATTERN = re.compile("([/^$<>*{}\\\\@#+~])", re.UNICODE)
analyzer = apertium.Analyzer("kir")
text = "Кыргызстанда ВИЧ/СПИД менен күрөшүүгө акча жетишпейт."
text = re.sub(ESC_PATTERN, r"\\\\\1", text.strip())
print(text)
analysis: List[LexicalUnit] = analyzer.analyze(text)
print([lexical_unit.wordform for lexical_unit in analysis]) Output
Thanks in advance. |
My own workaround is the following SPECIAL_CHARACTERS = list("/^$<>*{}\\@#+~")
REPLACEMENTS = ["shashchar", "capchar", "dollarchar", "lesschar", "morechar", "astchar",
"curlyleftchar", "curlyrightchar", "backslashchar", "atchar", "hashchar",
"pluschar", "tildechar"]
assert len(SPECIAL_CHARACTERS) == len(REPLACEMENTS)
spchar2code = {ch: co for ch, co in zip(SPECIAL_CHARACTERS, REPLACEMENTS)}
code2spchar = {co: ch for ch, co in zip(SPECIAL_CHARACTERS, REPLACEMENTS)}
analyzer = apertium.Analyzer("kir")
text = "Кыргызстанда ВИЧ/СПИД менен күрөшүүгө акча жетишпейт."
for spc in spchar2code:
text = text.replace(spc, f" {spchar2code[spc]} ")
print(text)
analysis: List[LexicalUnit] = analyzer.analyze(text)
tokens = [lu.wordform if lu.wordform not in code2spchar else code2spchar[lu.wordform] for lu in analysis]
print(tokens) but clearly that's not how the cool kids should do it. |
I would maybe just send it through apertium-destxt, though I don't know if apertium-python has some builtin way or you have to subprocess.communicate yourself |
Thank you, will give it a try! |
I am running apertium analyzer from a python script. I get this exception that terminates the script immediately. I am not able to catch it inside python, it seems like it's happenning in c++ and doesn't get handle, how can I handle it?
To replicate the issue:
The text was updated successfully, but these errors were encountered: