-
Notifications
You must be signed in to change notification settings - Fork 453
Deviations From Flex Bison
The section currently describes the GerHobbelt fork which has the
%options easy_keyword_rules
feature while vanilla jison has not (at least not a pullreq for this is posted by me (@GerHobbelt) and accepted.Hence vanilla jison will work as if you implicitly specified
%options easy_keyword_rules
in every lexer of yours.
When the lexer 'easy keyword' option has been turned on in your lexer file / section using
%options easy_keyword_rules
you will notice that token "foo"
will match whole word only, while ("foo")
will match foo
anywhere unless.
See issue #63 and GHO commit 64759c43.
Technically what happens is that
%options easy_keyword_rules
turns on lexer rule inspection and where it recognizes that a rule ends with a literal character, then the regex word edge\\b
check is appended to the lexer regex for the given rule.
The lexer will use the first rule that matches the input string unless you use %options flex
, in which case it will use the rule with the longest match.
Because Jison uses JavaScript’s regular expression engine, it is possible to use some metacharacters that are not present in Flex patterns.
See for a full list of available regex metacharacters the MDN documentation: Using Special Characters
Flex patterns support lookahead using /
, Jison adds negative lookahead using /!
.
Technically what happens is that
/\<atom>
and/!\<atom>
are 1:1 replaced by the regex expressions(?=\<atom>)
and(?!\<atom>)
respectively.
Jison supports as advanced grouping options
- non-grouping brackets
(?:PATTERN)
, - positive lookahead
(?=PATTERN)
and - negative lookahead
(?!PATTERN)
.
The flex macros yymore()
and yyless
must be rewritten to use the Jison lexer's JavaScript API calls:
-
yymore()
->this.more()
(See: flex manual and the Jison example as "test more()") -
yyless()
->this.less()
(See flex manual and the Jison example as "test less()")
Within lexer actions use %{ ... %}
delimiters if you want to use block-style statements, e.g.:
.* %{
if (true) {
console.log('test');
}
// ...
%}
Within parser actions you may alternatively use {{ .. }}
delimiters for the same purpose:
test
: STRING EOF {{
if (true) {
console.log('test');
}
// ...
return $1;
}}
;
though Jison also supports %{ ... %}
multi-line action blocks in the grammar rules:
test
: STRING EOF %{
if (true) {
console.log('test');
}
// ...
return $1;
}%
;
See issue #85
Actions should contain JavaScript instead of C, naturally.
As of Jison v0.2.8, you no longer need to use double braces {{...}}
around grammar rule
action code blocks.
From now on, single braces {...}
suffice.
There is a short-hand arrow syntax:
exp: ...
| '(' exp ')' -> $2
| exp '+' exp -> $1 + $3
Normally, you’ld have to use the position of the corresponding nonterminal or terminal in the production, prefixed by a dollar sign $, e.g.:
exp: ...
| '(' exp ')'
{ $$ = $2; }
Now, you can also access the value by using the name of the nonterminal instead of its position, e.g.:
exp: ...
| '(' exp ')'
{ $$ = $exp; }
If the rule is ambiguous (the nonterminal appears more than once,) append a number to the end of the nonterminal name to disambiguate the desired value:
exp: ...
| exp '+' exp
{ $$ = $exp1 + $exp2; }
Association by name leads to a looser coupling (and is easier to grok.)
This also works for accessing location information (compare with the Bison manual on Named references and their Actions and Locations section):
exp: ...
| '(' exp ')'
{ @$ = @exp; /* instead of @$ = $2 */ }
Another way to resolve ambiguity would be to use aliases in square brackets, for example:
exp: ...
| exp[left] '+' exp[right]
{ $$ = $left + $right; }
'Auto-numbering' means that the first occurrence of label (token name or alias) nnn
will also be available as nnn*1*
, and so on.
In the section above you may have seen one example where the nonterminal names have been auto-numbered to provide unambiguous access to each:
exp: ...
| exp '+' exp
{ $$ = $exp1 + $exp2; }
Note that in every Jison rule production, all the nonterminal names and all the aliases are always also available in 'auto-numbered' form, that is: when the same nonterminal name or alias occurs multiple times in the same rule, the action block can uniquely address a particular nonterminal or alias by using the auto-numbered form.
An example:
test
: subrule[alt] subrule[wicked_middle] subrule[alt] '?'[alt]
%{
// These are all unambiguous and legal to address $1, $2, $3 and $4:
//
// $1 === $subrule1 === $alt1
// $1 === $alt <-- first occurrence also registers the name itself!
// $2 === $subrule2 === $wicked_middle
// $3 === $subrule3 === $alt2
// $4 === $alt3
//
// @1 === @subrule1 === @alt1
// @1 === @alt <-- first occurrence also registers the name itself!
// @2 === @subrule2 === @wicked_middle
// @3 === @subrule3 === @alt2
// @4 === @alt3
%}
It doesn't say what'll happen if you go and game the system by using aliases with the same name as the nonterminals, e.g.
exp: ... | exp[exp] '+' exp[exp] { $$ = $exp1 + $exp3 /* 3? Are we sure about this? */; }
Jison now supports EBNF syntax, showcased here.