Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminate called after throwing an instance of 'Exception' what(): Error: Malformed input stream. #92

Open
eagad opened this issue Mar 8, 2021 · 13 comments · May be fixed by #93
Open

Comments

@eagad
Copy link

eagad commented Mar 8, 2021

I am running apertium analyzer from a python script. I get this exception that terminates the script immediately. I am not able to catch it inside python, it seems like it's happenning in c++ and doesn't get handle, how can I handle it?

terminate called after throwing an instance of 'Exception'
what(): Error: Malformed input stream.
Aborted (core dumped)

To replicate the issue:

import apertium
apertium.analyze('en', 'Hi/Hello')

@mr-martian
Copy link

It's because you have an unescaped / in your input string.

@eagad
Copy link
Author

eagad commented Mar 8, 2021

How would you escape it?

apertium.analyze('en', r'Hi/Hello')

throws the same exception

@mr-martian
Copy link

'Hi\\/Hello'

the escape has to get to the underlying pipe

@eagad
Copy link
Author

eagad commented Mar 8, 2021

This still didn't work

apertium.analyze('en', 'Hi\\/Hello')

terminate called after throwing an instance of 'Exception'
what(): Error: Malformed input stream.
Aborted (core dumped)

Also, is there specific list for characters that need to be escaped?

@mr-martian
Copy link

https://wiki.apertium.org/wiki/Apertium_stream_format

@ftyers
Copy link
Member

ftyers commented Mar 8, 2021

This still didn't work

apertium.analyze('en', 'Hi\\/Hello')

terminate called after throwing an instance of 'Exception'
what(): Error: Malformed input stream.
Aborted (core dumped)

Also, is there specific list for characters that need to be escaped?

Try adding another backslash ? :)

@eagad
Copy link
Author

eagad commented Mar 8, 2021

seems that backslashes are only interpreted as backslashes here...
Any ideas other than removing all the forward slashes from the text I am trying to process?

@mr-martian
Copy link

Probably what this indicates is that there should be a way to have analyse() invoke deformatters if there isn't already.

@mr-martian
Copy link

Also, I think this should actually be on https://github.com/apertium/apertium-python but I for some reason am not able to transfer it there

@sushain97 sushain97 transferred this issue from apertium/apertium-apy Mar 9, 2021
@alexeyev
Copy link

alexeyev commented Jul 25, 2023

Dear colleagues, thank you for your work.

How do i fix this? Some workaround maybe.

Minimal example:

    ESC_PATTERN = re.compile("([/^$<>*{}\\\\@#+~])", re.UNICODE)
    analyzer = apertium.Analyzer("kir")
    text = "Кыргызстанда ВИЧ/СПИД менен күрөшүүгө акча жетишпейт."
    text = re.sub(ESC_PATTERN, r"\\\\\1", text.strip())
    print(text)
    analysis: List[LexicalUnit] = analyzer.analyze(text)
    print([lexical_unit.wordform for lexical_unit in analysis])

Output

Кыргызстанда ВИЧ\\/СПИД менен күрөшүүгө акча жетишпейт.
Error: malformed input stream: Found unexpected character / unescaped in stream
: iostream error
['Кыргызстанда', 'ВИЧ', '\\\\/\\\\<sent>']

Thanks in advance.

@alexeyev
Copy link

My own workaround is the following

    SPECIAL_CHARACTERS = list("/^$<>*{}\\@#+~")
    REPLACEMENTS = ["shashchar", "capchar", "dollarchar", "lesschar", "morechar", "astchar",
                    "curlyleftchar", "curlyrightchar", "backslashchar", "atchar", "hashchar",
                    "pluschar", "tildechar"]

    assert len(SPECIAL_CHARACTERS) == len(REPLACEMENTS)

    spchar2code = {ch: co for ch, co in zip(SPECIAL_CHARACTERS, REPLACEMENTS)}
    code2spchar = {co: ch for ch, co in zip(SPECIAL_CHARACTERS, REPLACEMENTS)}

    analyzer = apertium.Analyzer("kir")
    text = "Кыргызстанда ВИЧ/СПИД менен күрөшүүгө акча жетишпейт."

    for spc in spchar2code:
        text = text.replace(spc, f" {spchar2code[spc]} ")

    print(text)
    analysis: List[LexicalUnit] = analyzer.analyze(text)
    tokens = [lu.wordform if lu.wordform not in code2spchar else code2spchar[lu.wordform] for lu in analysis]
    print(tokens)

but clearly that's not how the cool kids should do it.

@unhammer
Copy link
Member

I would maybe just send it through apertium-destxt, though I don't know if apertium-python has some builtin way or you have to subprocess.communicate yourself

@alexeyev
Copy link

Thank you, will give it a try!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants