-
Notifications
You must be signed in to change notification settings - Fork 452
Deviations From Flex Bison
The section currently describes the GerHobbelt fork which has the
%options easy_keyword_rules
feature while vanilla jison has not (at least not a pullreq for this is posted by me (@GerHobbelt) and accepted.
Hence vanilla jison will work as if you implicitly specified
%options easy_keyword_rules
in every lexer of yours.
When the lexer 'easy keyword' option has been turned on in your lexer file / section using
%options easy_keyword_rules
you will notice that token "foo"
will match whole word only, while ("foo")
will match foo
anywhere unless.
See issue #63 and GHO commit 64759c43.
Technically what happens is that
%options easy_keyword_rules
turns on lexer rule inspection and where it recognizes that a rule ends with a literal character, then the regex word edge\b
check is appended to the lexer regex for the given rule.
The lexer will use the first rule that matches the input string unless you use %options flex
, in which case it will use the rule with the longest match.
Because Jison uses JavaScript’s regular expression engine, it is possible to use some metacharacters that are not present in Flex patterns.
See for a full list of available regex metacharacters the MDN documentation: Using Special Characters
Flex patterns support lookahead using /
, Jison adds negative lookahead using /!
.
Technically what happens is that
/\<atom>
and/!\<atom>
are 1:1 replaced by the regex expressions(?=\<atom>)
and(?!\<atom>)
respectively.
Jison supports as advanced grouping options
- non-grouping brackets
(?:PATTERN)
, - positive lookahead
(?=PATTERN)
and - negative lookahead
(?!PATTERN)
.
The flex macros yymore()
and yyless
must be rewritten to use the Jison lexer's JavaScript API calls:
-
yymore()
->this.more()
(See: flex manual and the Jison example as "test more()") -
yyless()
->this.less()
(See flex manual and the Jison example as "test less()")
Within lexer actions use %{ ... %}
delimiters if you want to use block-style statements, e.g.:
.* %{
if (true) {
console.log('test');
}
// ...
%}
Within parser actions you may alternatively use {{ .. }}
delimiters for the same purpose:
test
: STRING EOF {{
if (true) {
console.log('test');
}
// ...
return $1;
}}
;
though Jison also supports %{ ... %}
multi-line action blocks in the grammar rules:
test
: STRING EOF %{
if (true) {
console.log('test');
}
// ...
return $1;
}%
;
See issue #85
Actions should contain JavaScript instead of C, naturally.
As of Jison v0.2.8, you no longer need to use double braces {{...}}
around grammar rule
action code blocks.
From now on, single braces {...}
suffice.
There is a short-hand arrow syntax:
exp: ...
| '(' exp ')' -> $2
| exp '+' exp -> $1 + $3
Normally, you’ld have to use the position of the corresponding nonterminal or terminal in the production, prefixed by a dollar sign $, e.g.:
exp: ...
| '(' exp ')'
{ $$ = $2; }
Now, you can also access the value by using the name of the nonterminal instead of its position, e.g.:
exp: ...
| '(' exp ')'
{ $$ = $exp; }
If the rule is ambiguous (the nonterminal appears more than once,) append a number to the end of the nonterminal name to disambiguate the desired value:
exp: ...
| exp '+' exp
{ $$ = $exp1 + $exp2; }
Association by name leads to a looser coupling (and is easier to grok.)
This also works for accessing location information (compare with the Bison manual on Named references and their Actions and Locations section):
exp: ...
| '(' exp ')'
{ @$ = @exp; /* instead of @$ = @2 */ }
Another way to resolve ambiguity would be to use aliases in square brackets, for example:
exp: ...
| exp[left] '+' exp[right]
{ $$ = $left + $right; }
'Auto-numbering' means that the first occurrence of label (token name or alias) nnn
will also be available as nnn*1*
, and so on.
In the section above you may have seen one example where the nonterminal names have been auto-numbered to provide unambiguous access to each:
exp: ...
| exp '+' exp
{ $$ = $exp1 + $exp2; }
Note that in every Jison rule production, all the nonterminal names and all the aliases are always also available in 'auto-numbered' form, that is: when the same nonterminal name or alias occurs multiple times in the same rule, the action block can uniquely address a particular nonterminal or alias by using the auto-numbered form.
An example:
test
: subrule[alt] subrule[wicked_middle] subrule[alt] '?'[alt]
%{
// These are all unambiguous and legal to address $1, $2, $3 and $4:
//
// $1 === $subrule1 === $alt1
// $1 === $alt <-- first occurrence also registers the name itself!
// $2 === $subrule2 === $wicked_middle
// $3 === $subrule3 === $alt2
// $4 === $alt3
//
// @1 === @subrule1 === @alt1
// @1 === @alt <-- first occurrence also registers the name itself!
// @2 === @subrule2 === @wicked_middle
// @3 === @subrule3 === @alt2
// @4 === @alt3
%}
It doesn't say what'll happen if you go and game the system by using aliases with the same name as the nonterminals, e.g.
exp: ... | exp[exp] '+' exp[exp] { $$ = $exp1 + $exp3 /* 3? Are we sure about this? */; }
If you wonder, RTFC: vanilla vs. RTFC: GerHobbelt
WARNING: vanilla zaach/jison doesn't behave the same when it comes to mixing aliases and nonterminal names.
The section currently describes the GerHobbelt fork. With vanilla zaach/jison the safe rule of thumb here is that when you specify an alias for a nonterminal, then you SHOULD NOT USE the nonterminal name itself any more in your action code.
RTFC to compare and check each's behaviour here: vanilla vs. GerHobbelt
Jison now supports EBNF syntax, showcased here.
EBNF is accepted by the jison grammar engine and transposed to a BNF grammar using equivalence transforms for each of the EBNF *
, +
, ?
and (...)
operators.
For these EBNF wildcards & groups the following treatment must be kept in mind:
-
Only the outermost wild-carded group's label or index is addressable in your action. That group is translated to a single nonterminal, e.g.
rule: A (B C D E)?
becomes
rule: A subrule_option0 subrule_option0: /* nil */ | B C D E;
hence your action block for rule
rule
will only have$1
and$2
(thesubrule_option0
nonterminal) to play with.As jison allows labeling the wildcarded group, such an alias might keep things more readable:
rule: A (B C D E)?[choice]
becomes
rule: A subrule_option0[choice] subrule_option0: /* nil */ | B C D E;
WARNING: it's illegal to attempt to access
$B
,$C
et al from yourrule
's action code block and very bad things will happen you.-
vanilla zaach/jison will not translate those references and your code will be TOAST.
-
GerHobbelt/jison analyzes your action code chunk and attempts to locate all your
$whatever
and@whatever
references in there and barfs a hairball (i.e. fails at jison compile time) with a big fat error message if you do.Do note that we are a little dumb scanner, so we will recognize those references even when they sit in a nice cozy comment in there![Edit: not since GerHobbelt/jison@80b6de1d311778d2bdfd71a2c39db570049d092a -> GerHobbelt/jison@0.4.17-132 -- since then the action code macro expansion is smart enough to skip (almost all) strings and comments.]
-
-
(...)*
,(...)+
and(...)?
are the wildcarded ones and will be rewritten to equivalent BNF rules.You MAY nest these constructs.
-
The
(...)
group is also recognized (no wildcard operator there): it will be unrolled. Unless there's a label attached to it. In that case it's rewritten.Hence
rule: A (B C D E)
becomes
rule: A B C D E;
while
rule: A (B C D E)\[groupies]
becomes
rule: A subrule\[groupies] subrule: B C D E;
so be aware that a little change like that can play havoc on your (action) code: the former, unrolled, grouping gives you access to all it terms (nonterminals and terminals alike), while the labeled a.k.a. aliased version hides those inner terms from you.
-
In order to have something decent to work with in your action code, every wildcard or non-wilcarded group which is not unrolled will collect all its terms' values (
yytext
) as produced by the lexer and store it in an array, thus constructing a Poor Man's AST:rule: A (B C+ (D E)\[hoobahop])?\[choice]
becomes
rule: A subrule_option0[choice] subrule_option0: /* nil */ | subrule_option1; subrule_option1: B C+ (D E)\[hoobahop];
which becomes
rule: A subrule_option0[choice] subrule_option0: /* nil */ | subrule_option0; subrule_option1: B subrule_series1 hoobahop_group0; subrule_series1: subrule_series1 C | C; hoobahop_group0: D E;
which will deliver in your
$choice
reference an array shaped like this (comments show the origin of each bit):// subrule_option0 [ // **Note**: // as this is choice, you get only the value // // undefined // // when you've hit the **nil** epsilon choice instead! // subrule_option1: [ B, // subrule_series1 [ // the EBNF rewriter is smart enough to see that there's // only 1(one) term in this one: `C` so no extra arrays-in-array // for you here: C, C, ... ], // hoobahop_group0 [ D, E ] ] ]
The above is written for the GerHobbelt fork as currently EBNF support in vanilla zaach/jison is ever so slightly b0rked.
But that's not what this warning is about!
As I (Gerhobbelt) write this, I wonder if this really really is the truth. It may be that the current bleeding edge (master branch) still has a few, ahh..., sub-optimalities in reality compared to the above.
To Be Checked
Next blurb copy-pasta-d from the gist listed further above. Has some details which needs to be updated in the docs too...
Some improvements have been made for parser and lexer grammars in Jison 0.3 (demonstrated in the FlooP/BlooP example below.)
For lexers:
- Patterns may use unquoted characters instead of strings
- Two new options,
%options flex case-insensitive
-
flex
: the rule with the longest match is used, and no word boundary patterns are added -
case-insensitive
: all patterns are case insensitive - User code section is included in the generated module
For parsers:
- Arrow syntax for semantic actions
- EBNF syntax (enabled using the
%ebnf
declaration) - Operators include repetition (
*
), non-empty repetition (+
), grouping (()
), alternation within groups (|
), and option (?
) - User code section and code blocks are included in the generated module
Also, Robert Plummer has created a PHP port of Jison's parser.
See the grammar below for more examples.