Skip to content

enaix/SuperCFG

Repository files navigation

SuperCFG

A C++ header-only templated shift-reduce parser generator library. Original name stands for context-free-grammar superset generator.

Features

  • Grammar rules definition using templates
  • Fully templated parsing pipeline (lexer, parser and a user-defined AST class)
  • Shift-reduce parser with optional lookahead(1) algorithm
  • Legacy recursive-descent parser (WIP)
  • Compile-time rules serialization in custom *EBNF-like notation

Getting Started

Requirements

This project requires C++20 support, no additional libraries needed.

Tested on the following configurations:

  • clang version 19.1.7, target: x86_64-pc-linux-gnu
  • clang version 19.1.7, target: arm64-apple-darwin24.3.0

At this moment, gcc support is limited.

Documentation

This project uses doxygen for documentation (WIP). Feature progress can be found in docs

Building the examples

cd examples

mkdir -p build && cd build

cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DCMAKE_CXX_FLAGS=-ftemplate-depth=1000 # May be needed for larger grammars

make -j<threads>

./calc # Execute interactive calc example

./json # Execute interactive json example (this is not json spec: no escape characters and whitespaces)

Grammar Definition Example

The first step is to define your grammar:

#include "cfg/gbnf.h"
#include "cfg/str.h"
#include "cfg/base.h"
#include "cfg/parser.h"
#include "cfg/containers.h"

// ...

// Define non-terminals
constexpr auto digit = NTerm(cs<"digit">());
constexpr auto number = NTerm(cs<"number">());

// Define terminals and grammar rules
// digit := '0'-'9'
constexpr auto d_digit = Define(digit, Alter(
    Term(cs<"1">()), Term(cs<"2">()), Term(cs<"3">()), 
    Term(cs<"4">()), Term(cs<"5">()), Term(cs<"6">()), 
    Term(cs<"7">()), Term(cs<"8">()), Term(cs<"9">()), 
    Term(cs<"0">())
));

// Define a rule for numbers (sequence of digits)
constexpr auto d_number = Define(number, Repeat(digit));

// Combine rules into a ruleset, it's a top-level root definition
constexpr auto ruleset = RulesDef(d_digit, d_number);

Parser Initialization and Usage

See docs/USAGE.md

Parser/lexer configuration flags are described in docs/CONFIGURATION.md

Symbols and operators

Term(name) is a terminal character. Currently it fully supports only single-character terminals

TermsRange(start, end) is a range of terminals, which lexicographically iterates over the range [start, end]. Note that the exact order depends on the char type.

NTerm(name) is a nonterminal with a unique name, which describes its type.

Basic Operators

Name Description Example
Concat Concatenation A,B,C
Alter Alternation A or B or C
Define Definition A := ...
Optional None or once [A]
Repeat None or ∞ times {A}
Group Parenthesis (A)
Comment Comment (*ABC*)
SpecialSeq Special sequence ?ABC?
Except Exception A-B
End Termination ABC.
RulesDef Rules definition

Extended Operators

Name Description
RepeatExact Repeat exactly M times
RepeatGE Repeat at least M times
RepeatRange Repeat [M,N] times

Grammar serialization

SuperCFG can serialize grammar rules to a custom EBNF-like notation at compile-time. See docs/USAGE.md

Examples

Some of the examples can be found in docs/EXAMPLES.md. The code is located in examples/json.cpp