Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] The Python grammars in this repo are in a terrible state! #3539

Closed
kaby76 opened this issue Jun 16, 2023 · 4 comments
Closed

[python] The Python grammars in this repo are in a terrible state! #3539

kaby76 opened this issue Jun 16, 2023 · 4 comments

Comments

@kaby76
Copy link
Contributor

kaby76 commented Jun 16, 2023

There are 10 python grammars in this repo. I don't know where to even begin on what to pick and maintain because they are all terrible.

  • It's disorganized. Which grammar is "the one" to use and maintain?
  • The python3/ grammar is behind the current version of Python
  • Some of the grammars are poorly tested. E.g., python3alt has one--yes, one--test! python3-cpp has no tests. python3-ty has no tests. python3-py has two glorious tests.
  • There are forked grammars for different targets (python2-js, python3-cpp, python3-py, python3-ts). How far are they behind python2 or python3/ or is that python/?? (Maybe the target-specific grammars are ahead in features from these three.)
  • There are probably a dozen or two open issues for the Python grammars.
  • Targets are missing for python, python2, and python3 grammars. For example, there is no python3/Cpp. By the way, does python3-cpp/ work? Are the grammars python3/ and python3-cpp/ the same even if we take into account the split/combined difference?

The plan to reorganize

  • Nuke target-specific grammars like python3-cpp.
  • Nuke python3-without-actions. It's very unlikely that that grammar works for any reasonable set of input, so we should not pretend that a grammar without actions is going to work.
  • Nuke python3alt. Bart hasn't updated this for 2 years, and it's far behind the current version of Python3.
  • Nuke tiny-python/. This grammar hasn't been updated for 3 or 4 years. It contains a version that have actions and one that doesn't. It's really impossible to know what to make of this grammar.
  • Decide what to do of the remaining grammars (python, python2, python3).
  • Make ports for all targets.
@kaby76 kaby76 changed the title [python] The Python grammars in this repo are terrible! [python] The Python grammars in this repo are in a terrible state! Jun 16, 2023
@kaby76
Copy link
Contributor Author

kaby76 commented Jun 17, 2023

The python/python3 grammar has a JavaScript port, but it is ripping slow in comparison with the CSharp and Java ports.

06/17-06:42:17 ~/issues/g4-3539/python/python3/Generated-CSharp
$ bash run.sh ../examples/base_events.py
CSharp 0 ../examples/base_events.py success 1.0922569
Total Time: 1.1610637
06/17-06:44:41 ~/issues/g4-3539/python/python3/Generated-CSharp
$ pushd
~/issues/g4-3539/python/python3/Generated-JavaScript ~/issues/g4-3539/python/python3/Generated-CSharp
06/17-06:44:44 ~/issues/g4-3539/python/python3/Generated-JavaScript
$ bash run.sh ../examples/base_events.py
JavaScript 0 ../examples/base_events.py success 101.734
Total Time: 101.746
06/17-06:46:30 ~/issues/g4-3539/python/python3/Generated-JavaScript

The JavaScript port is typically slower than CSharp by a factor of 5, not 100x's.

For example, abb:

$ bash run.sh ../examples/robdata.sys
CSharp 0 ../examples/robdata.sys success 0.0331986
Total Time: 0.0832029
06/17-07:06:31 ~/issues/g4-3539/abb/Generated-CSharp
$ pushd
~/issues/g4-3539/abb/Generated-JavaScript ~/issues/g4-3539/abb/Generated-CSharp ~/issues/g4-3539/python/python3/Generated-CSharp
06/17-07:06:34 ~/issues/g4-3539/abb/Generated-JavaScript
$ bash run.sh ../examples/robdata.sys
JavaScript 0 ../examples/robdata.sys success 0.098
Total Time: 0.105
06/17-07:06:36 ~/issues/g4-3539/abb/Generated-JavaScript
$

The JavaScript port should not be this slow in comparison to the CSharp port. The parser traces of the closure compuations are identical. I do notice several; issues:

  1. this grammar does not use optimized Antlr4 syntax for expressions, which can cause the parse trees to have large single child chains;
  2. the max-k for atom-expr is huge, and there is a large error count associated with the "trailers*" reference in the rule. I wonder whether there is a missing semantic predicate.
Decision  Rule                    Invocations  Time      Total-k  Max-k  Fallback  Ambiguities  Errors
...
155       atom_expr               2796         2.312377  4922     28     6         0            324
...
  1. I wonder whether the JavaScript runtime handles "errors" efficiently.

@kaby76
Copy link
Contributor Author

kaby76 commented Jun 17, 2023

Ther perf data was mislabeled. Here's what it should be. (trperf needs to be fixed.)

Decision  Rule                    Invocations  Time      Total-k  Max-k  Fallback  Ambiguities  Errors  Transitions
...
155       atom_expr               2796         6.195964  4922     28     6         0            0       324

What I'm thinking is that the parser reads too far ahead per trailer in the atom_expr rule. It requires context in the parse 6 times, too, but I don't think that is the problem--most are SLL transitions, not LL. It feels like there is an anti-pattern here.

teverett pushed a commit that referenced this issue Jul 8, 2023
* Fixes for #3539

Typescript cannot work because the declaration of the constructor for CommonToken() is wrong.

* Fix python/python3/TypeScript

* Remove grammars that only add to confusion.

* Update Dart port, but this cannot work because Antlr 4.13.0 Dart runtime missing types.

* Add Dart port for python3 grammar.

This base class for Dart works, but requires changes to Antlr. antlr/antlr4#4321

* Updates, but incomplete, for Cpp.

* Additional changes.

* Updates to get Cpp target to link.

* Changes for working Cpp target.

* Add in Cpp to workflow.

* Updates.

* Adjust to for rebuild.

* Remove bothersome tabs from source code.

* Rename tests into "small" and "large" tests to reflect that "slow parsers work on the small test suite".

* Fix names of test directories in desc.xml.

* Update for Go target. Remove python3-cpp.

* Getting Go target compiling--does not work yet.

* Fixes for Go target of python3 grammar.

This port works but only if the Go runtime is fixed, and "Virt" is assigned in the driver. See antlr/antlr4#4343 antlr/antlr4#4342

* Fix desc.xml
rvrooman-codelogic pushed a commit to CodeLogicIncEngineering/grammars-v4-public-fork that referenced this issue Jul 27, 2023
* Fixes for antlr#3539

Typescript cannot work because the declaration of the constructor for CommonToken() is wrong.

* Fix python/python3/TypeScript

* Remove grammars that only add to confusion.

* Update Dart port, but this cannot work because Antlr 4.13.0 Dart runtime missing types.

* Add Dart port for python3 grammar.

This base class for Dart works, but requires changes to Antlr. antlr/antlr4#4321

* Updates, but incomplete, for Cpp.

* Additional changes.

* Updates to get Cpp target to link.

* Changes for working Cpp target.

* Add in Cpp to workflow.

* Updates.

* Adjust to for rebuild.

* Remove bothersome tabs from source code.

* Rename tests into "small" and "large" tests to reflect that "slow parsers work on the small test suite".

* Fix names of test directories in desc.xml.

* Update for Go target. Remove python3-cpp.

* Getting Go target compiling--does not work yet.

* Fixes for Go target of python3 grammar.

This port works but only if the Go runtime is fixed, and "Virt" is assigned in the driver. See antlr/antlr4#4343 antlr/antlr4#4342

* Fix desc.xml
@pramit-j2-sl
Copy link

I have two suggestions:

  • Create the final output in a single grammar file, rather than two files.
  • Fetch from actual Python repo to import gram files and convert them to antlr4-g4 grammar file.

@kaby76
Copy link
Contributor Author

kaby76 commented Oct 18, 2024

I'm going to close the issue I opened because the python grammars are actually in an "okay state", and they are getting better with @RobEin on top of it.

Create the final output in a single grammar file rather than two files.

This cannot work given the constraints.

The grammar has to be split because the lexer has modes. Antlr does not accept a combined grammar with lexer modes. Further, the lexer needs to be in "target agnostic format" because if it were not, there would be forking of the .g4s for every target type. This is exactly how we got into a bad state to start: people would modify one target and not maintain the other targets. With target agnostic, there is one grammar for all targets.

Fetch from actual Python repo to import gram files and convert them to antlr4-g4 grammar file.

That is the plan.

@kaby76 kaby76 closed this as completed Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants