terminate called after throwing an instance of 'Exception' what(): Error: Malformed input stream. #92

eagad · 2021-03-08T20:43:18Z

I am running apertium analyzer from a python script. I get this exception that terminates the script immediately. I am not able to catch it inside python, it seems like it's happenning in c++ and doesn't get handle, how can I handle it?

terminate called after throwing an instance of 'Exception'
what(): Error: Malformed input stream.
Aborted (core dumped)

To replicate the issue:

import apertium
apertium.analyze('en', 'Hi/Hello')

mr-martian · 2021-03-08T20:45:04Z

It's because you have an unescaped / in your input string.

eagad · 2021-03-08T20:51:43Z

How would you escape it?

apertium.analyze('en', r'Hi/Hello')

throws the same exception

mr-martian · 2021-03-08T20:52:24Z

'Hi\\/Hello'

the escape has to get to the underlying pipe

eagad · 2021-03-08T20:55:25Z

This still didn't work

apertium.analyze('en', 'Hi\\/Hello')

terminate called after throwing an instance of 'Exception'
what(): Error: Malformed input stream.
Aborted (core dumped)

Also, is there specific list for characters that need to be escaped?

mr-martian · 2021-03-08T21:00:39Z

https://wiki.apertium.org/wiki/Apertium_stream_format

ftyers · 2021-03-08T21:01:17Z

This still didn't work

apertium.analyze('en', 'Hi\\/Hello')

terminate called after throwing an instance of 'Exception'
what(): Error: Malformed input stream.
Aborted (core dumped)

Also, is there specific list for characters that need to be escaped?

Try adding another backslash ? :)

eagad · 2021-03-08T21:14:55Z

seems that backslashes are only interpreted as backslashes here...
Any ideas other than removing all the forward slashes from the text I am trying to process?

mr-martian · 2021-03-08T22:47:00Z

Probably what this indicates is that there should be a way to have analyse() invoke deformatters if there isn't already.

mr-martian · 2021-03-08T22:49:38Z

Also, I think this should actually be on https://github.com/apertium/apertium-python but I for some reason am not able to transfer it there

alexeyev · 2023-07-25T11:05:55Z

Dear colleagues, thank you for your work.

How do i fix this? Some workaround maybe.

Minimal example:

    ESC_PATTERN = re.compile("([/^$<>*{}\\\\@#+~])", re.UNICODE)
    analyzer = apertium.Analyzer("kir")
    text = "Кыргызстанда ВИЧ/СПИД менен күрөшүүгө акча жетишпейт."
    text = re.sub(ESC_PATTERN, r"\\\\\1", text.strip())
    print(text)
    analysis: List[LexicalUnit] = analyzer.analyze(text)
    print([lexical_unit.wordform for lexical_unit in analysis])

Output

Кыргызстанда ВИЧ\\/СПИД менен күрөшүүгө акча жетишпейт.
Error: malformed input stream: Found unexpected character / unescaped in stream
: iostream error
['Кыргызстанда', 'ВИЧ', '\\\\/\\\\<sent>']

Thanks in advance.

alexeyev · 2023-07-25T11:30:59Z

My own workaround is the following

    SPECIAL_CHARACTERS = list("/^$<>*{}\\@#+~")
    REPLACEMENTS = ["shashchar", "capchar", "dollarchar", "lesschar", "morechar", "astchar",
                    "curlyleftchar", "curlyrightchar", "backslashchar", "atchar", "hashchar",
                    "pluschar", "tildechar"]

    assert len(SPECIAL_CHARACTERS) == len(REPLACEMENTS)

    spchar2code = {ch: co for ch, co in zip(SPECIAL_CHARACTERS, REPLACEMENTS)}
    code2spchar = {co: ch for ch, co in zip(SPECIAL_CHARACTERS, REPLACEMENTS)}

    analyzer = apertium.Analyzer("kir")
    text = "Кыргызстанда ВИЧ/СПИД менен күрөшүүгө акча жетишпейт."

    for spc in spchar2code:
        text = text.replace(spc, f" {spchar2code[spc]} ")

    print(text)
    analysis: List[LexicalUnit] = analyzer.analyze(text)
    tokens = [lu.wordform if lu.wordform not in code2spchar else code2spchar[lu.wordform] for lu in analysis]
    print(tokens)

but clearly that's not how the cool kids should do it.

unhammer · 2023-07-25T15:19:37Z

I would maybe just send it through apertium-destxt, though I don't know if apertium-python has some builtin way or you have to subprocess.communicate yourself

alexeyev · 2023-07-28T04:49:19Z

Thank you, will give it a try!

sushain97 transferred this issue from apertium/apertium-apy Mar 9, 2021

singh-lokendra linked a pull request Mar 9, 2021 that will close this issue

update deformatter (apertium-destxt) escape chars #93

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

terminate called after throwing an instance of 'Exception' what(): Error: Malformed input stream. #92

terminate called after throwing an instance of 'Exception' what(): Error: Malformed input stream. #92

eagad commented Mar 8, 2021

mr-martian commented Mar 8, 2021

eagad commented Mar 8, 2021

mr-martian commented Mar 8, 2021

eagad commented Mar 8, 2021 •

edited

Loading

mr-martian commented Mar 8, 2021

ftyers commented Mar 8, 2021

eagad commented Mar 8, 2021 •

edited

Loading

mr-martian commented Mar 8, 2021

mr-martian commented Mar 8, 2021

alexeyev commented Jul 25, 2023 •

edited

Loading

alexeyev commented Jul 25, 2023

unhammer commented Jul 25, 2023

alexeyev commented Jul 28, 2023

terminate called after throwing an instance of 'Exception' what(): Error: Malformed input stream. #92

terminate called after throwing an instance of 'Exception' what(): Error: Malformed input stream. #92

Comments

eagad commented Mar 8, 2021

mr-martian commented Mar 8, 2021

eagad commented Mar 8, 2021

mr-martian commented Mar 8, 2021

eagad commented Mar 8, 2021 • edited Loading

mr-martian commented Mar 8, 2021

ftyers commented Mar 8, 2021

eagad commented Mar 8, 2021 • edited Loading

mr-martian commented Mar 8, 2021

mr-martian commented Mar 8, 2021

alexeyev commented Jul 25, 2023 • edited Loading

alexeyev commented Jul 25, 2023

unhammer commented Jul 25, 2023

alexeyev commented Jul 28, 2023

eagad commented Mar 8, 2021 •

edited

Loading

eagad commented Mar 8, 2021 •

edited

Loading

alexeyev commented Jul 25, 2023 •

edited

Loading