Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with ACE and --tsdb-stdout with long unparsable sentences #86

Closed
goodmami opened this issue Nov 11, 2016 · 3 comments
Closed

Error with ACE and --tsdb-stdout with long unparsable sentences #86

goodmami opened this issue Nov 11, 2016 · 3 comments
Milestone

Comments

@goodmami
Copy link
Member

When ACE v0.9.24 is installed, pyDelphin defaults to using the --tsdb-stdout option in order to get more information out of ACE. NOTEs and other messages to stderr are printed as normal. When a long unparsable sentence causes a NOTE to be generated, this message may be flushed to stderr before the stdout line has finished being written. Because pyDelphin interleaves stdout and stderr (in order to properly read the output; especially for generation), the NOTE on stderr makes the s-expression on stdout unreadable by pyDelphin.

Some solutions include:

  • de-interleave stdout and stderr when --tsdb-stdout is used
  • change ACE so it only flushes stderr after writing a newline to stdout
  • load ACE for every input sentence so output can be read until EOF
  • ignore stderr entirely when --tsdb-stdout is used

The example sentence is:

这个 学期 我 有 四 门 课 , 不  太 忙 , 我 每天 十点一 刻 去 上课 , 十一点二十分 下课 , 然后 , 我 喜欢 去 图书馆 看书 , 下午 下课 以后 , 我 常 和 朋友 一起 去 打球 。

It was parsed using a recent version of Zhong (the grammar at zhong/cmn/zhs/)

@goodmami goodmami modified the milestone: v0.6.0 Jan 4, 2017
@goodmami
Copy link
Member Author

This was fixed in 8b2b664

@goodmami
Copy link
Member Author

Reopening because the fix doesn't work for some cases. E.g. (sentence taken from ws01 corpus of Redwoods):

$ echo "== Why algorithms are necessary: an informal definition =="| ace -g ~/grammars/erg-1214-x86-64-0.9.24.dat -1Tq --tsdb-stdout --report-labels
NOTE: tsdb run: (:application . "answer") (:platform . "gcc 4.2") (:grammar . "ERG (1214)") (:avms . 9280) (:lexicon . 38259) (:lrules . 81) (:rules . 212)
NOTE: lexemes do not span position 0 `=='!
NOTE: post reduction gap
(:ninputs . 10) (:p-input . "(1, 0, 1, <0:2>, 1, \"==\", 0, \"null\", \"VB\" 1.0) (2, 1, 2, <3:6>, 1, \"Why\", 0, \"null\", \"WRB\" 1.0) (3, 2, 3, <7:17>, 1, \"algorithms\", 0, \"null\", \"NNS\" 1.0) (4, 3, 4, <18:21>, 1, \"are\", 0, \"null\", \"VBP\" 1.0) (5, 4, 5, <22:31>, 1, \"necessary\", 0, \"null\", \"JJ\" 1.0) (6, 5, 6, <31:32>, 1, \":\", 0, \"null\", \":\" 1.0) (7, 6, 7, <33:35>, 1, \"an\", 0, \"null\", \"DT\" 1.0) (8, 7, 8, <36:44>, 1, \"informal\", 0, \"null\", \"JJ\" 1.0) (9, 8, 9, <45:55>, 1, \"definition\", 0, \"null\", \"NN\" 1.0) (10, 9, 10, <56:58>, 1, \"==\", 0, \"null\", \"NN\" 1.0)") (:copies . 567) (:unifications . 32052) (:ntokens . 25) (:p-tokens . "(94, 2, 3, <7:17>, 1, \"algorithms\", 0, \"null\") (96, 3, 4, <18:21>, 1, \"are\", 0, \"null\") (98, 4, 5, <22:31>, 1, \"necessary\", 0, \"null\") (100, 6, 7, <33:35>, 1, \"an\", 0, \"null\") (102, 7, 8, <36:44>, 1, \"informal\", 0, \"null\") (104, 8, 9, <45:55>, 1, \"definition\", 0, \"null\") (106, 5, 6, <31:32>, 1, \":\", 0, \"null\") (108NOTE: ignoring `== Why algorithms are necessary: an informal definition =='
, 9, 10, <56:58>, 1, \"==\", 0, \"null\") (110, 0, 1, <0:2>, 1, \"==\", 0, \"null\") (111, 2, 3, <7:17>, 1, \"algorithms\", 0, \"null\") (112, 3, 4, <18:21>, 1, \"are\", 0, \"null\") (113, 4, 5, <22:31>, 1, \"necessary\", 0, \"null\") (114, 7, 8, <36:44>, 1, \"informal\", 0, \"null\") (115, 8, 9, <45:55>, 1, \"definition\", 0, \"null\") (116, 1, 2, <3:6>, 1, \"why\", 0, \"null\") (117, 2, 3, <7:17>, 1, \"algorithms\", 0, \"null\") (118, 3, 4, <18:21>, 1, \"are\", 0, \"null\") (119, 4, 5, <22:31>, 1, \"necessary\", 0, \"null\") (120, 6, 7, <33:35>, 1, \"an\", 0, \"null\") (121, 7, 8, <36:44>, 1, \"informal\", 0, \"null\") (122, 8, 9, <45:55>, 1, \"definition\", 0, \"null\") (123, 5, 6, <31:32>, 1, \":\", 0, \"null\") (124, 9, 10, <56:58>, 1, \"==\", 0, \"null\") (125, 0, 1, <0:2>, 1, \"==\", 0, \"null\") (126, 1, 2, <3:6>, 1, \"why\", 0, \"null\")") (:readings . 0) (:pedges . 0) (:aedges . 0)   (:total . 17) (:treal . 17) (:tcpu . 17) (:others . 3424416) (:error . "post-reduction lexical gap")

Trying to parse this with PyDelphin's ACE interface gives this (largely uninformative) error:

At position 0: 
  At position 0: 
  At position 0: Expected to match: -?(0|[1-9]\d*)(\.\d+[eE][-+]?|\.|[eE][-+]?)\d+
  At position 0: Expected to match: -?\d+
  At position 0: 
  At position 0: Expected to match: "[^"\\]*(?:\\.[^"\\]*)*"
  At position 0: Expected to match: (?:[^"\s\(\)\[\]\{\}\\;]+|\\.)+
  At position 12: Expected to match: \)\s*

At first glance, it seems that the items causing errors all have colons in them.

@goodmami goodmami reopened this Jan 24, 2017
@goodmami goodmami modified the milestones: v0.6.0, v0.7.0, v0.6.1 Jan 24, 2017
@goodmami
Copy link
Member Author

This error seems to be sporadic, as the last time I tested it the symptoms didn't show. Nevertheless, I'll adjust the parsing behavior to be more defensive about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant