Replies: 9 comments 5 replies
-
Hi, |
Beta Was this translation helpful? Give feedback.
-
@ericvergnaud - I made some progress but this is quite confusing, very non-intuitive to me. I got my two files sorted In
Now, the last step of my change is to split the lexer grammar itself file into two, I want This is just not acting nicely though, it is very frustrating, and I can't find definitive documentation about how to use and understand all the ways of combining multiple files into a single grammar. You can see the parser file here and the lexer file here, this works but I'd like to extract a subset of the lexer file into a separate file too. The stuff I'd like to extract (and import, into the lexer grammar) is currently demarcated by comments inside the lexer grammar file:
In case you're wondering why, I want to auto generate the file to do this kind of thing:
From a json file that contains all of the keywords in multiple cultures. This example shows just English and French for the English "binary" but ultimately it would generate that for every keyword and a multiplicity of languages, English, French, Dutch, German... |
Beta Was this translation helpful? Give feedback.
-
@parrt FYI an occurence of the topic of our recent discussion |
Beta Was this translation helpful? Give feedback.
-
@ericvergnaud - OK that fixed it, I was unaware that the imported text appeared after the local text switching things around fixed it! Thanks |
Beta Was this translation helpful? Give feedback.
-
It could be helpful to have a new way to include, namely an "include" term rather than only having "import". The "include" could be used to insert the text right into the place where the "include" directive is positioned, literally replace that line with all of the lines from the included file. Then one could include text much more flexibly... |
Beta Was this translation helpful? Give feedback.
-
Splitting a grammar isn't particularly hard:
There's definitely more to the refactoring, but that usually suffices for simple grammars (that don't have options). See trsplit. There currently is no list of Antlr lexer rules that implement a full Unicode character table (e.g., Latin_Capital_letter_N : 'N';). But, it is on my "to do" list. Combined with an "XPath grep", one could extract out all the string literals and select the rules in the Unicode table, and define rules for string literals of more than one code point, so you don't have to type all these rules manually. |
Beta Was this translation helpful? Give feedback.
-
I'm going to have to stop, I cannot continue just now, it has become so bewildering that I don't even know what questions to ask any more. Everything has fallen apart and I was making excellent progress too, or so it seemed. All three .g4 files are being consumed fine yet I get errors reported when I run the parse now for input source that previously parsed fine. I need to step away, its been gruelling... |
Beta Was this translation helpful? Give feedback.
-
Personally I don't think I've ever used to the import, so perhaps you don't need it. That definitely simplifies your life. |
Beta Was this translation helpful? Give feedback.
-
OK I'm back in the game! I decided (as one often does in this business) to step back to an earlier point and its good. I reverted to the original single .g4 file and now simply use my tool to generate a set of ANTLR4 token defs like this:
and just copy/paste that into the .g4 - overwriting the earlier version of that block of statements - it works! This is the initial proof, I can now easily maintain a JSON file of keywords per (human) language, generate the above file and then copy/paste into the g4, the effort is 1% more than it would be if I was generating the .g4 and importing it and so on. (This may not be the most efficient way to do what I want but it does work and that's fine for time being). |
Beta Was this translation helpful? Give feedback.
-
This has consumed three hours of my time so far, I'm absolutely stumped.
I want to split a grammar file into two, from the few articles scattered around the web and from example of real grammars (like the ones for Ada) it see we can just put parser rules and grammar rules into two files and inside the files name them the same as the files.
I have named the lexer file:
ImperiumLexer.g4
and the parser fileImperiumParser.g4
.ImperiumParser.g4
starts withparser grammar ImperiumParser;
ImperiumLexer.g4
starts withlexer grammar ImperiumLexer;
But I cannot get this work, am I expected to pass both file names into the Antlr4 tool? one name? just the raw grammar name with no "Parser" or "Lexer" parts? I've tried this and more and nothing works.
I tried putting an
import
into the parser to import the lexer and that too caused immense confusion.Look:
specifying the files in the opposite order:
Here are the two files (they are in same folder)
I even found some stack overflow post that suggested one just specify
*.g4
but nope, that too leads to problems...This naming pattern seems to be used all over the place in the Antlr4 example grammars, like I said Ada but also CSharp, these split the files and name the files and so on, exactly as I am, what am I doing wrong?
As I play with this I wonder why we don't name these files
<grammar>.<parers/lexer>.g4
that would isolate the grammar name nicely, a distinct component of the file name, then the files themselves could (internally) name the grammar<grammar>
and the "lexer" and "parser" stuff could be left out of the grammar name altogether...Thx
Beta Was this translation helpful? Give feedback.
All reactions