Error with ACE and --tsdb-stdout with long unparsable sentences #86

goodmami · 2016-11-11T18:54:07Z

When ACE v0.9.24 is installed, pyDelphin defaults to using the --tsdb-stdout option in order to get more information out of ACE. NOTEs and other messages to stderr are printed as normal. When a long unparsable sentence causes a NOTE to be generated, this message may be flushed to stderr before the stdout line has finished being written. Because pyDelphin interleaves stdout and stderr (in order to properly read the output; especially for generation), the NOTE on stderr makes the s-expression on stdout unreadable by pyDelphin.

Some solutions include:

de-interleave stdout and stderr when --tsdb-stdout is used
change ACE so it only flushes stderr after writing a newline to stdout
load ACE for every input sentence so output can be read until EOF
ignore stderr entirely when --tsdb-stdout is used

The example sentence is:

这个 学期 我 有 四 门 课 ， 不  太 忙 ， 我 每天 十点一 刻 去 上课 ， 十一点二十分 下课 ， 然后 ， 我 喜欢 去 图书馆 看书 ， 下午 下课 以后 ， 我 常 和 朋友 一起 去 打球 。

It was parsed using a recent version of Zhong (the grammar at zhong/cmn/zhs/)

The text was updated successfully, but these errors were encountered:

goodmami · 2017-01-18T20:40:31Z

This was fixed in 8b2b664

goodmami · 2017-01-24T18:25:20Z

Reopening because the fix doesn't work for some cases. E.g. (sentence taken from ws01 corpus of Redwoods):

$ echo "== Why algorithms are necessary: an informal definition =="| ace -g ~/grammars/erg-1214-x86-64-0.9.24.dat -1Tq --tsdb-stdout --report-labels
NOTE: tsdb run: (:application . "answer") (:platform . "gcc 4.2") (:grammar . "ERG (1214)") (:avms . 9280) (:lexicon . 38259) (:lrules . 81) (:rules . 212)
NOTE: lexemes do not span position 0 `=='!
NOTE: post reduction gap
(:ninputs . 10) (:p-input . "(1, 0, 1, <0:2>, 1, \"==\", 0, \"null\", \"VB\" 1.0) (2, 1, 2, <3:6>, 1, \"Why\", 0, \"null\", \"WRB\" 1.0) (3, 2, 3, <7:17>, 1, \"algorithms\", 0, \"null\", \"NNS\" 1.0) (4, 3, 4, <18:21>, 1, \"are\", 0, \"null\", \"VBP\" 1.0) (5, 4, 5, <22:31>, 1, \"necessary\", 0, \"null\", \"JJ\" 1.0) (6, 5, 6, <31:32>, 1, \":\", 0, \"null\", \":\" 1.0) (7, 6, 7, <33:35>, 1, \"an\", 0, \"null\", \"DT\" 1.0) (8, 7, 8, <36:44>, 1, \"informal\", 0, \"null\", \"JJ\" 1.0) (9, 8, 9, <45:55>, 1, \"definition\", 0, \"null\", \"NN\" 1.0) (10, 9, 10, <56:58>, 1, \"==\", 0, \"null\", \"NN\" 1.0)") (:copies . 567) (:unifications . 32052) (:ntokens . 25) (:p-tokens . "(94, 2, 3, <7:17>, 1, \"algorithms\", 0, \"null\") (96, 3, 4, <18:21>, 1, \"are\", 0, \"null\") (98, 4, 5, <22:31>, 1, \"necessary\", 0, \"null\") (100, 6, 7, <33:35>, 1, \"an\", 0, \"null\") (102, 7, 8, <36:44>, 1, \"informal\", 0, \"null\") (104, 8, 9, <45:55>, 1, \"definition\", 0, \"null\") (106, 5, 6, <31:32>, 1, \":\", 0, \"null\") (108NOTE: ignoring `== Why algorithms are necessary: an informal definition =='
, 9, 10, <56:58>, 1, \"==\", 0, \"null\") (110, 0, 1, <0:2>, 1, \"==\", 0, \"null\") (111, 2, 3, <7:17>, 1, \"algorithms\", 0, \"null\") (112, 3, 4, <18:21>, 1, \"are\", 0, \"null\") (113, 4, 5, <22:31>, 1, \"necessary\", 0, \"null\") (114, 7, 8, <36:44>, 1, \"informal\", 0, \"null\") (115, 8, 9, <45:55>, 1, \"definition\", 0, \"null\") (116, 1, 2, <3:6>, 1, \"why\", 0, \"null\") (117, 2, 3, <7:17>, 1, \"algorithms\", 0, \"null\") (118, 3, 4, <18:21>, 1, \"are\", 0, \"null\") (119, 4, 5, <22:31>, 1, \"necessary\", 0, \"null\") (120, 6, 7, <33:35>, 1, \"an\", 0, \"null\") (121, 7, 8, <36:44>, 1, \"informal\", 0, \"null\") (122, 8, 9, <45:55>, 1, \"definition\", 0, \"null\") (123, 5, 6, <31:32>, 1, \":\", 0, \"null\") (124, 9, 10, <56:58>, 1, \"==\", 0, \"null\") (125, 0, 1, <0:2>, 1, \"==\", 0, \"null\") (126, 1, 2, <3:6>, 1, \"why\", 0, \"null\")") (:readings . 0) (:pedges . 0) (:aedges . 0)   (:total . 17) (:treal . 17) (:tcpu . 17) (:others . 3424416) (:error . "post-reduction lexical gap")

Trying to parse this with PyDelphin's ACE interface gives this (largely uninformative) error:

At position 0: 
  At position 0: 
  At position 0: Expected to match: -?(0|[1-9]\d*)(\.\d+[eE][-+]?|\.|[eE][-+]?)\d+
  At position 0: Expected to match: -?\d+
  At position 0: 
  At position 0: Expected to match: "[^"\\]*(?:\\.[^"\\]*)*"
  At position 0: Expected to match: (?:[^"\s\(\)\[\]\{\}\\;]+|\\.)+
  At position 12: Expected to match: \)\s*

At first glance, it seems that the items causing errors all have colons in them.

goodmami · 2017-01-31T19:29:44Z

This error seems to be sporadic, as the last time I tested it the symptoms didn't show. Nevertheless, I'll adjust the parsing behavior to be more defensive about this.

goodmami added a commit that referenced this issue Nov 15, 2016

Fix #86; deal with interleaved ACE stderr messages

8b2b664

goodmami modified the milestone: v0.6.0 Jan 4, 2017

goodmami closed this as completed Jan 18, 2017

goodmami reopened this Jan 24, 2017

goodmami modified the milestones: v0.6.0, v0.7.0, v0.6.1 Jan 24, 2017

goodmami closed this as completed in f0268ae Mar 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with ACE and --tsdb-stdout with long unparsable sentences #86

Error with ACE and --tsdb-stdout with long unparsable sentences #86

goodmami commented Nov 11, 2016

goodmami commented Jan 18, 2017

goodmami commented Jan 24, 2017

goodmami commented Jan 31, 2017

Error with ACE and --tsdb-stdout with long unparsable sentences #86

Error with ACE and --tsdb-stdout with long unparsable sentences #86

Comments

goodmami commented Nov 11, 2016

goodmami commented Jan 18, 2017

goodmami commented Jan 24, 2017

goodmami commented Jan 31, 2017