Skip to content

Deviations From Flex Bison

Ger Hobbelt edited this page Oct 26, 2015 · 19 revisions

Lex Patterns

Literal tokens

WARNING: vanilla zaach/jison has 'easy keyword' support turned on all the time

The section currently describes the GerHobbelt fork which has the %options easy_keyword_rules feature while vanilla jison has not (at least not a pullreq for this is posted by me (@GerHobbelt) and accepted.

Hence vanilla jison will work as if you implicitly specified %options easy_keyword_rules in every lexer of yours.

When the lexer 'easy keyword' option has been turned on in your lexer file / section using

%options easy_keyword_rules

you will notice that token "foo" will match whole word only, while ("foo") will match foo anywhere unless.

See issue #63 and GHO commit 64759c43.

Under THe Hood

Technically what happens is that %options easy_keyword_rules turns on lexer rule inspection and where it recognizes that a rule ends with a literal character, then the regex word edge \\b check is appended to the lexer regex for the given rule.

Longest rule matching

The lexer will use the first rule that matches the input string unless you use %options flex, in which case it will use the rule with the longest match.

Additions

Because Jison uses JavaScript’s regular expression engine, it is possible to use some metacharacters that are not present in Flex patterns.

See for a full list of available regex metacharacters the MDN documentation: Using Special Characters

Negative Lookahead

Flex patterns support lookahead using /, Jison adds negative lookahead using /!.

Under The Hood

Technically what happens is that /\<atom> and /!\<atom> are 1:1 replaced by the regex expressions (?=\<atom>) and (?!\<atom>) respectively.

Advanced Grouping Options

Jison supports as advanced grouping options

  • non-grouping brackets (?:PATTERN),
  • positive lookahead (?=PATTERN) and
  • negative lookahead (?!PATTERN).

yymore, yyless, etc...

The flex macros yymore() and yyless must be rewritten to use the Jison lexer's JavaScript API calls:

Braces in actions

Within lexer actions use %{ ... %} delimiters if you want to use block-style statements, e.g.:

.*  %{
  if (true) {
    console.log('test');
  }
  // ...
%}

Within parser actions you may alternatively use {{ .. }} delimiters for the same purpose:

test
  : STRING EOF  {{
    if (true) {
      console.log('test');
    }
    // ...
    return $1;
  }}
  ;

though Jison also supports %{ ... %} multi-line action blocks in the grammar rules:

test
  : STRING EOF  %{
    if (true) {
      console.log('test');
    }
    // ...
    return $1;
  }%
  ;

See issue #85

Semantic Actions

Actions should contain JavaScript instead of C, naturally.

Braces

As of Jison v0.2.8, you no longer need to use double braces {{...}} around grammar rule action code blocks.

From now on, single braces {...} suffice.

Short-hand syntax

There is a short-hand arrow syntax:

 exp:    ...
         | '(' exp ')' -> $2
         | exp '+' exp -> $1 + $3

Accessing values and location information

Normally, you’ld have to use the position of the corresponding nonterminal or terminal in the production, prefixed by a dollar sign $, e.g.:

 exp:    ...
         | '(' exp ')'
             { $$ = $2; }

Now, you can also access the value by using the name of the nonterminal instead of its position, e.g.:

 exp:    ...
         | '(' exp ')'
             { $$ = $exp; }

If the rule is ambiguous (the nonterminal appears more than once,) append a number to the end of the nonterminal name to disambiguate the desired value:

 exp:    ...
         | exp '+' exp
             { $$ = $exp1 + $exp2; }

Association by name leads to a looser coupling (and is easier to grok.)

This also works for accessing location information (compare with the Bison manual on Named references and their Actions and Locations section):

 exp:    ...
         | '(' exp ')'
             { @$ = @exp; /* instead of @$ = $2 */ }

Another way to resolve ambiguity would be to use aliases in square brackets, for example:

 exp:    ...
         | exp[left] '+' exp[right]
             { $$ = $left + $right; }

Auto-numbered named accessors

'Auto-numbering' means that the first occurrence of label (token name or alias) nnn will also be available as nnn*1*, and so on.

In the section above you may have seen one example where the nonterminal names have been auto-numbered to provide unambiguous access to each:

 exp:    ...
         | exp '+' exp
             { $$ = $exp1 + $exp2; }

Note that in every Jison rule production, all the nonterminal names and all the aliases are always also available in 'auto-numbered' form, that is: when the same nonterminal name or alias occurs multiple times in the same rule, the action block can uniquely address a particular nonterminal or alias by using the auto-numbered form.

An example:

test
: subrule[alt] subrule[wicked_middle] subrule[alt] '?'[alt]
%{
    // These are all unambiguous and legal to address $1, $2, $3 and $4:
    //
    // $1 === $subrule1 === $alt1
    // $1 === $alt  <-- first occurrence also registers the name itself!
    // $2 === $subrule2 === $wicked_middle
    // $3 === $subrule3 === $alt2
    // $4 === $alt3
    //
    // @1 === @subrule1 === @alt1
    // @1 === @alt  <-- first occurrence also registers the name itself!
    // @2 === @subrule2 === @wicked_middle
    // @3 === @subrule3 === @alt2
    // @4 === @alt3
%}
Caveat Emptor

It doesn't say what'll happen if you go and game the system by using aliases with the same name as the nonterminals, e.g.

exp:    ...
         | exp[exp] '+' exp[exp]
             { $$ = $exp1 + $exp3 /* 3? Are we sure about this? */; }

Extended BNF

Jison now supports EBNF syntax, showcased here.

Clone this wiki locally