A C++ header-only templated shift-reduce parser generator library. Original name stands for context-free-grammar superset generator.
- Grammar rules definition using templates
- Fully templated parsing pipeline (lexer, parser and a user-defined AST class)
- Shift-reduce parser with optional lookahead(1) algorithm
- Legacy recursive-descent parser (WIP)
- Compile-time rules serialization in custom *EBNF-like notation
This project requires C++20 support, no additional libraries needed.
Tested on the following configurations:
- clang version 19.1.7, target: x86_64-pc-linux-gnu
- clang version 19.1.7, target: arm64-apple-darwin24.3.0
At this moment, gcc support is limited.
This project uses doxygen for documentation (WIP). Feature progress can be found in docs
cd examples
mkdir -p build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DCMAKE_CXX_FLAGS=-ftemplate-depth=1000 # May be needed for larger grammars
make -j<threads>
./calc # Execute interactive calc example
./json # Execute interactive json example (this is not json spec: no escape characters and whitespaces)
The first step is to define your grammar:
#include "cfg/gbnf.h"
#include "cfg/str.h"
#include "cfg/base.h"
#include "cfg/parser.h"
#include "cfg/containers.h"
// ...
// Define non-terminals
constexpr auto digit = NTerm(cs<"digit">());
constexpr auto number = NTerm(cs<"number">());
// Define terminals and grammar rules
// digit := '0'-'9'
constexpr auto d_digit = Define(digit, Alter(
Term(cs<"1">()), Term(cs<"2">()), Term(cs<"3">()),
Term(cs<"4">()), Term(cs<"5">()), Term(cs<"6">()),
Term(cs<"7">()), Term(cs<"8">()), Term(cs<"9">()),
Term(cs<"0">())
));
// Define a rule for numbers (sequence of digits)
constexpr auto d_number = Define(number, Repeat(digit));
// Combine rules into a ruleset, it's a top-level root definition
constexpr auto ruleset = RulesDef(d_digit, d_number);
See docs/USAGE.md
Parser/lexer configuration flags are described in docs/CONFIGURATION.md
Term(name)
is a terminal character. Currently it fully supports only single-character terminals
TermsRange(start, end)
is a range of terminals, which lexicographically iterates over the range [start, end]
. Note that the exact order depends on the char type.
NTerm(name)
is a nonterminal with a unique name, which describes its type.
Name | Description | Example |
---|---|---|
Concat | Concatenation | A,B,C |
Alter | Alternation | A or B or C |
Define | Definition | A := ... |
Optional | None or once | [A] |
Repeat | None or ∞ times | {A} |
Group | Parenthesis | (A) |
Comment | Comment | (*ABC*) |
SpecialSeq | Special sequence | ?ABC? |
Except | Exception | A-B |
End | Termination | ABC. |
RulesDef | Rules definition |
Name | Description |
---|---|
RepeatExact | Repeat exactly M times |
RepeatGE | Repeat at least M times |
RepeatRange | Repeat [M,N] times |
SuperCFG can serialize grammar rules to a custom EBNF-like notation at compile-time. See docs/USAGE.md
Some of the examples can be found in docs/EXAMPLES.md
. The code is located in examples/json.cpp