Skip to content
RY edited this page Jan 9, 2022 · 24 revisions

The Library main purpose is to provide an easy, straightforward and flexible way to map source text to a desired format with a variety of options and customizations available. Basic components which are involved in the workflow and their relationships can be demonstrated by the following diagram:

  • Grammar defines the primary source mapping units(terminals) and the set of rules(productions) describing the way they're supposed to be reduced to a target structure.
  • Each terminal is expressed by a regular expression and can be easily constructed from the library included primitives.
  • ParserBase defines an primary abstract interface for the parser which partially has to be implemented in the custom parser, while the rest is going to be generated automatically according to the grammar provided.
  • Custom parser defines a public interaction interface and also is responsible for source reduction logic.
  • Final parser is a dynamic CLR type, with both lexical and syntactic analyzers generated, which guarantees all the custom defined reducers are executed in the expected order.

Regular Expressions

Regular Expressions are constructed from nodes of the following types:

  • Text nodes to match a single position within a source which supports any combination of Unicode character codes, ranges and Unicode categories:
Rex.Char('r');
Rex.Char(@"a-z0-9");
Rex.Char(@"\u{0|10-ff|ccc|10000-10FFFF}");
Rex.Char(@"\p{Cc|Cf|Cn|Co|Cs|C|Ll|Lm|Lo|Lt|Lu|L|Mc|Me|Mn|M|Nd|Nl|No|N|Pc|Pd|Pe|Po|Ps|Pf|Pi|P|Sc|Sk|Sm|So|S|Zl|Zp|Zs|Z}");
Rex.Char(@"\a\b\t\r\n\v\f\\");

There is also a convenient way to configure a match for any character (or any set in general) excluding specific characters, ranges or categories:

Rex.Char(@"0-10ffff-[\p{L}]");
Rex.Char(@"0-10ffff-[a]");
Rex.Except(@"\p{L}");
Rex.Except('a');
  • Text nodes are combined in an expression by OR, AND(concatenation) and REPEAT patterns:
Rex.Char('a').Then(Rex.Char('b')).Then(Rex.Char('b'));
Rex.Text("abb");
Rex.Or(Rex.Char('a'), Rex.Char('b')).NoneOrMore().Then("abb"); // (a|b)*abb
  • Special kind of a conditional expressions are also available. The idea is to define a sub-expression, which is supposed to be checked at some position in a pattern and an evaluation is transferred to the state which follows only if the sub-expression has succeed(positive lookahead) or failed(negative lookahead). Every time a conditional expression evaluation is completed current text position is reset to the state where it's started:
Rex.IfNot("-->").Then(Rex.AnyChar).NoneOrMore();
Rex.Char('a').NoneOrMore().FollowedBy('b');

Compilation

An individual regular expression can be compiled in a delegate of int RexEvaluator(string content, int offset, int length) type. Accepting a text input and boundaries the dynamic method will find and return a length of the longest sub-string has been matched or -1 if none succeed:

var digit = Rex.Char("0-9");
var number = digit.OneOrMore();
var eval = Rex.Compile(number);
var input = "x123y";

Console.WriteLine(eval(input, 0, input.Length));     // outputs -1
Console.WriteLine(eval(input, 1, input.Length - 1)); // outputs  3
Console.WriteLine(eval(input, 1, 2));                // outputs  2
Clone this wiki locally