Not yet released
- Use the migration guide when updating.
- Infrastructure
- Updated required C++ standard to C++17.
- Updated required CMake version to 3.8.
- The macro
TAO_PEGTL_NAMESPACE
now contains the fully qualified namespace, e.g.tao::pegtl
. - Added
[[nodiscard]]
or[[noreturn]]
to most non-void functions.
- Meta-Data Layer
- Replaced
analysis_t
with more general and completerule_t
andsubs_t
. - Added functions to visit all rules of a grammar.
- Added functions to measure rule coverage of a parsing run.
- Moved the analysis function and header to contrib.
- Replaced
- Error Handling
- Replaced
tao::pegtl::input_error
withstd::system_error
. - Added
must_if<>
- Allows to define custom error messages for global errors.
- Adds a non-intrusive way to define global parse errors for a grammar retroactively.
- Replaced
- Demangling
- Removed the need for RTTI.
- Some broken/unknown compilers will use RTTI as a fallback, without demangling.
- Moved
tao::pegtl::internal::demangle<T>()
totao::demangle<T>()
. - Improved generated code to be shorter and more efficient.
- Removed the need for RTTI.
- Parse Tree
- Removed the need for RTTI.
- Other
- Changed
byte_in_line
to 1-based counting. - Moved rule
eolf
from inline namespacetao::pegtl::ascii
totao::pegtl
. - Changed rules in
tao/pegtl/contrib/integer.hpp
to not accept redundant leading zeros. - Added rules to
tao/pegtl/contrib/integer.hpp
that test unsigned values against a maximum. - Demoted UTF-16 and UTF-32 support to contrib.
- Demoted UINT-8, UINT-16, UINT-32 and UINT-64 support to contrib.
- Folded
contrib/counter.hpp
intojson_count.cpp
, count is superceded by coverage.
- Changed
- Cleanup
- Removed option of state's
S::success()
to have an extended signature to get access to the currentapply_mode
,rewind_mode
, action- and control class (template). - Removed compatibility macros starting with
TAOCPP_PEGTL_
. - Removed compatibility uppercase enumerators.
- Removed compatibility
peek_byte()
member functions. - Removed compatibility header
changes.hpp
from contrib.
- Removed option of state's
Released 2020-04-22
- Fixed excessive read-ahead with incremental inputs.
- Added state manipulators
remove_first_state
,remove_last_states
,rotate_states_right
,rotate_states_left
, andreverse_states
to contrib. - Reduced the number of intermediate parse tree nodes.
Released 2020-04-05
- Fixed parse tree node generation to correctly remove intermediate nodes.
Released 2019-08-06
- Added fallback symbol demangling if RTTI is disabled.
- Fixed missing
string_input<>
in amalgamated header. - Fixed
discard_input*
actions to properly forward the apply mode. - Fixed contrib HTTP grammar for chunked data.
Released 2019-04-09
- Use the migration guide when updating.
- Changed enumerators to lowercase.
- Renamed
tracking_mode::IMMEDIATE
totracking_mode::eager
. - Compatibility enumerators with uppercase names are still included.
- Will be removed in version 3.0.0.
- Renamed
- Renamed
peek_byte()
topeek_uint8()
.- Compatibility member functions with previous names are still included.
- Will be removed in version 3.0.0.
- Compatibility member functions with previous names are still included.
- Allowed actions to implement
match
. - Made deriving action class templates from
nothing
optional. - Added debug tools
require_apply
andrequire_apply0
. - Added combinator class
rematch
. - Improved the Parse Tree / AST interface to mostly hide its internal state.
- Added new action-based helpers
change_*.hpp
.- The control-based helpers in
contrib/changes.hpp
are still included.- Will be removed in version 3.0.0.
- The control-based helpers in
- Added new action-based helpers
disable_action.hpp
andenable_action.hpp
. - Added new action-based helpers
discard_input.hpp
,discard_input_on_success.hpp
, anddiscard_input_on_failure.hpp
. - Added Clang Static Analyzer to the CI build.
- Added new Makefile target
amalgamate
to generate a single-header version of the PEGTL. - Added support for Universal Windows Platform (UWP).
Released 2018-09-29
- Added new ASCII convenience rule
forty_two
. - Added experimental
if_then
rule. - Simplified how parse tree nodes can be selected.
- Reduced the number of intermediate parse tree nodes.
- Allowed an action class template to be used with the parse tree.
Released 2018-07-31
- Added
mmap_file<>
support for Windows. - Added deduction guides for the input classes when compiling with C++17.
Released 2018-07-22
- Fixed endianness detection in test program.
Released 2018-06-22
- Added Conan packages.
- Fixed the UTF-8 decoder to no longer accept UTF-16 surrogates.
- Fixed the UTF-16 decoder to no longer accept UTF-16 unmatched surrogates.
- Fixed the UTF-32 "decoder" to no longer accept UTF-16 surrogates.
- Fixed
pegtl/contrib/unescape.hh
to no longer accept unmatched surrogates. - Optimised convenience rule
two
. - Added new convenience rule
three
.
Released 2018-05-31
- Fixed
opt
anduntil
to work as documented in some rare edge cases. - Used
opt_must
andstar_must
to optimise some included grammars.
Released 2018-05-14
- Added new convenience rule
opt_must
. - Optimised convenience rule
if_must
. - Fixed examples to compile with Visual Studio and MinGW.
- Added automated testing with GCC 8.
Released 2018-05-01
- Added rules to match Unicode properties via ICU to contrib.
- Improved the Parse Tree / AST interface.
- Fixed parse tree node generation to correctly remove intermediate nodes.
- Added big- and little-endian support to the UTF-16 and UTF-32 rules.
- Added rules for UINT-8 and big- and little-endian UINT-16, UINT-32 and UINT-64.
- Added member functions to
memory_input<>
to obtain the line around a position. - Added member functions to
memory_input<>
to start again from the beginning. - Added example for Python-style indentation-aware grammars.
- Added examples for regular, context-free, and context-sensitive grammars.
- Added example for how to parse with a symbol table.
- Added automated testing with Clang 6.
- Added automated testing with Clang's
-fms-extensions
. - Fixed build with Clang when
-fms-extensions
is used (clang-cl
).
Released 2018-02-17
- Use the migration guide when updating.
- Improved and documented the Parse Tree / AST support.
- Changed prefix of all macros from
TAOCPP_PEGTL_
toTAO_PEGTL_
.- Compatibility macros with the old names are provided.
- They will be removed in version 3.0.0.
- Added a deleted overload to prevent creating a
memory_input<>
from a temporarystd::string
.
Released 2018-02-08
- Fixed build on older systems where
O_CLOEXEC
is not available. - Added automated testing with Android 6.0 and 7.0.
Released 2018-01-01
- Added more
noexcept
-specifications. - Fixed most
clang-tidy
-issues.
Released 2017-12-16
- Worked around a Visual Studio 15.5 bug.
Released 2017-12-14
- Fixed linkage of
tao::pegtl::internal::file_open
. - Improved error message for missing
source
parameter ofstring_input<>
.
Released 2017-12-11
- Added constructor to
read_input<>
that accepts aFILE*
, see issue #78. - Enhanced
apply
,apply0
andif_apply
to supportapply()
/apply0()
returning boolean values. - Simplified implementation of
raw_string
, the optionalContents...
rules'apply()
/apply0()
are now called with the original states. - Fixed the tracer to work with
apply()
/apply0()
returning boolean values. - Fixed, simplified and improved
examples/parse_tree.cpp
.
Released 2017-11-22
- Bumped version.
Released 2017-11-22
- Celebrating the PEGTL's 10th anniversary!
- Fixed missing call to the control class'
failure()
when a rule withapply()
with a boolean return type fails. - Fixed string handling in
examples/abnf2pegtl.cc
. - Simplified/improved Android build.
Released 2017-09-24
- Added possibility for an action's
apply()
orapply0()
to returnbool
which is then used to determine overall success or failure of the rule to which such an action was attached. - Added
<tao/pegtl/contrib/parse_tree.hpp>
and theexamples/parse_tree.cpp
application that shows how to build a parse tree. The example goes beyond a traditional parse tree and demonstrates how to select which nodes to include in the parse tree and how to transform the nodes into an AST-like structure. - Added
bom
rules for UTF-8, UTF-16 and UTF-32. - Added some missing includes for
config.hpp
. - Added automated testing with Clang 5.
- Added automated testing with Xcode 9.
Released 2017-06-27
- Fixed shadow warning.
Released 2017-06-27
- Fixed
raw_string
with optional parameters.
Released 2017-06-25
- Bumped version.
Released 2017-06-25
- Fixed build with MinGW on Windows.
- Added automated testing with MinGW.
Released 2017-06-23
- Added optional template parameters to
raw_string
for rules that the content must match. - Added new contrib rules
rep_one_min_max
andellipsis
. - Fixed broken
TAOCPP_PEGTL_KEYWORD
macro. - Fixed a bug in the contrib HTTP grammar which prevented it from parsing status lines in some cases.
- Fixed build with MinGW-w64 on Windows.
- Added automated testing with MinGW-w64.
- Added automated testing with GCC 7.
Released 2017-05-18
-
Project
- Migrated to "The Art of C++".
- Use the migration guide when updating.
- Version 2.z can be installed and used in parallel to version 1.y of the PEGTL.
- The semantics of all parsing rules and grammars is the same as for versions 1.y.
-
Input Layer
- Added support for custom incremental input readers.
- Added support for parsing C streams, i.e.
std::FILE*
. - Added support for parsing C++ streams, i.e.
std::istream
. - Added support for different EOL-styles.
- Renamed class
position_info
toposition
. - Added the byte position to input classes and
position
. - Added fast parsing without line counting (except in errors).
- Refactored the
input
class into multiple input classes. - Refactored the file parser classes into input classes.
- Refactored the handling of nested parsing.
- Removed the
begin()
member from classposition
. - Removed most parsing front-end functions.
-
Parsing Rules
-
String Macros
- Renamed to
TAOCPP_PEGTL_(I)STRING
. - Increased allowed string length to 512.
- Allowed embedded null bytes.
- Reduced template instantiation depth.
- Renamed to
-
Other Changes
- Added
apply()
andapply0()
to the control class. - Optimised superfluous input markers.
- Allowed optimisation of actions that do not need the input.
- Replaced layered matching with superior Duseltronik™.
- Reduced template instantiation depth.
- Added support for CMake.
- Added automated testing with Visual Studio 2015 and 2017.
- Added automated testing with Android 5.1, NDK r10e.
- Added
Released 2016-04-06
- Fixed unit test to use
eol
instead of hard-coded line ending.
Released 2016-04-06
- Tentative Android compatibility.
- Fixed build with MinGW on Windows.
- Changed file reader to open files in binary mode.
- Changed
eol
andeolf
to accept both Unix and MS-DOS line endings. - Optimised bumping the input forward and removed little used bump function.
- Simplified grammar analysis algorithm (and more
analyze()
tests).
Released 2015-11-12
- Improved the JSON grammar and JSON string escaping.
- Added JSON test suite from http://json.org/JSON_checker/.
- Optimised bumping the input forward and string unescaping.
- Promoted
examples/json_changes.hh
topegtl/contrib/changes.hh
.
Released 2015-09-21
- Added
file_parser
as alias formmap_parser
orread_parser
depending on availability of the former. - Added Clang 3.7 to the automated tests.
- Added Mac OS X with Xcode 6 and Xcode 7 to the automated tests.
- Added coverage test and improved test coverage to 100%.
- Fixed state changing bug in
json_build_one
example.
Released 2015-08-23
- Added
pegtl_string_t
andpegtl_istring_t
to simplify string definitions as follows:
pegtl::string< 'h', 'e', 'l', 'l', 'o' > // Normal
pegtl_string_t( "hello" ) // New shortcut
- Added
examples/abnf2pegtl.cc
application that converts grammars based on ABNF (RFC 5234) into a PEGTL C++ grammar. - Added
contrib/alphabet.hh
with integer constants for alphabetic ASCII letters.
Released 2015-07-31
- Renamed namespace
pegtl::ucs4
topegtl::utf32
and generally adopted UTF-32 in all naming. - Added experimental support for UTF-16 similar to the previously existing UTF-32 parsing rules.
- Added support for merging escaped UTF-16 surrogate pairs to
pegtl/contrib/unescape.hh
. - Fixed incorrect handling of escaped UTF-16 surrogate pairs in the JSON examples.
- A state's
S::success()
can now have an extended signature to get access to the currentapply_mode
, action- and control class (template). - The
contrib/raw_string
class template now callsAction<raw_string<...>::content>::apply()
with the user's state(s).
Released 2015-03-29
Version 1.0.0 was a very large refactoring based on the previous years of experience. The core design and approach were kept, but nearly all details of the implementation were changed, and some parts were added to, or removed from, the library. Semantic versioning was introduced with version 1.0.0.
- Deprecated old site on Google code and published new version on GitHub.
- Removed the semi-automatic pretty-printing of grammar rules; now the class names are used, when possible demangled.
- Renamed rule classes with multiple words in their names to use underscores, e.g.
ifmust<>
is nowif_must<>
. - Removed support for incremental/stream parsing to allow for some simplifications and optimisations (reintroduced in 2.0.0).
- Removed the rules
apply<>
andif_apply<>
that were used to directly call actions from within the grammar (reintroduced in 2.0.0), and: - Where the other method of attaching actions to rules in PEGTL 0.x required specialisation of a given class template
action<>
, in PEGTL 1.y the action class template can be chosen by the user and changed at any point in the grammar. - As a side-effect there is a much cleaner way of enabling and disabling actions in a portion of the grammar.
- Actions now have access to the current position in the input, i.e. to the filename, and line and column number.
- Actions now receive a pointer to, and the size of, the matched portion of the input (previously a
std::string
with a copy of the matched data), therefore: There is no distinction between actions that require access to the matched data and those that don't, furthermore:- The object via which actions gain access to the matched data is similar to that which rules receive
so actions can easily invoke another grammar on the matched data. - The
at<>
andnot_at<>
rules now call their subordinate rules with actions disabled. - The variadic
states...
arguments that are passed through all rule invocations for use by the actions are not forwarded withstd::forward<>
anymore since it (usually) doesn't make much sense to move them, and accidentially moving multiple times was a possible error scenario. - There are now five different
rep
rules for repeating a sequence of rules with more control over the acceptable or required number of repetitions. - There are new rules
try_catch<>
andtry_catch_type<>
that convert global errors, i.e. exceptions, into local errors, i.e. a return value offalse
. - Unified concept for actions and debug hooks, i.e. just like the actions are called from a class template that is passed into the top-level
parse()
function, there is another class template that is called for debug/trace and error throwing purposes; both can be changed at any point within the grammar. - A large under-the-hood reorganisation has the benefit of preventing actions from being invoked on rules that are implementation details of other rules, e.g. the
pad< Rule, Padding >
rule containsstar< Padding >
in its implementation, so a specialisation of the action-class-template forstar< Padding >
would be called withinpad<>
, even though thestar< Pad >
was not explicitly written by the user; in PEGTL 1.y these unintended action invocations no longer occur. - Partial support for Unicode has been added in the form of some basic rules like
one<>
andrange<>
also being supplied in a UTF-8 (and experimental UTF-16 and UTF-32) aware version(s) that can correctly process arbitrary code points from0
to0x10ffff
. - The supplied input classes work together with the supplied exception throwing to support better error locations when performing nested file parsing, i.e. a
parse_error
contains a vector of parse positions. - Added a function to analyse a grammar for the presence of infinite loops, i.e. cycles in the rules that do not (necessarily) consume any input like left recursion.
- As actions are applied to a grammar in a non-invasive way, several common grammars were added to the PEGTL as documented in Contrib and Examples.
- The
list<>
-rule was replaced by a set of new list rules with different padding semantics. - The
at_one<>
and other rulesfoo
that are merely shortcuts forat< foo >
were removed. - The
if_then<>
rule was removed. - The
error_mode
flag was removed. - The semantics of the
must<>
rules was changed to convert local failure to global failure only for the immediate sub-rules of amust<>
rule. - The
parse()
functions now return abool
and can also produce local failures. To obtain the previous behaviour of success-or-global-failure, the top-level grammar rule has to be wrapped in amust<>
.
Released 2012-12
- Removed superfluous includes (issue 5 from Google code hosting).
- Fixed bug in
not_at
rule regarding wrong propagation of errors (issue 3 from Google code hosting).
Released 2011-02
- Fixed bug in
not_at
rule regarding wrong propagation of errors (issue 3 from Google code hosting).
- Fixed missing template arguments in the implementation of
smart_parse_string()
.
- Fixed broken convenience rules
space_until_eof
andblank_until_eol
. - Extended the included examples that show how to build parse trees etc.
- Optimised object file footprint of class
printer
and some related functions. - Renamed class
rule_helper
torule_base
andaction_helper
toaction_base
.
- Changed the type of exceptions thrown by the library to
pegtl::parse_error
. - Changed class
basic_debug
to only generate a grammar back-trace when apegtl::parse_error
is flying. - Changed logging to use a virtual member function on the debug class inherited from common debug base class.
- Removed all
*_parse_*_nothrow()
parse functions. - Removed the
_throws
substring from all remaining parse functions and changed the return type tovoid
. - Added convenience classes
file_input
,ascii_file_input
anddummy_file_input
for custom parse functions.
- Changed pretty-printing of the
until
andif...
rules (consistency). - Changed pretty-printing of rules to use ":=" instead of "===" (conciseness).
- Renamed rule
action
toifapply
and removed ruleaction_nth
(orthogonality). - Renamed action
apply_nth
tonth
, and renamed some other actions (consistency). - Extended pretty-printing to the
apply
andifapply
rules (completeness).
The last of these changes effectively requires custom action classes to derive either from a valid rule class, or from the new class pegtl::action_helper<>
, passing itself as template argument.
- Fixed and cleaned up the rule pretty-printer in many places (readability).
- Added new convenience rule
enclose
, useful for quoted strings (convenience). - Added new rule
apply
to unconditionally apply an action with empty matched string (convenience). - Added action argument to
list
rule and added actionnop
for use as default action (convenience).
- Fixed some bugs in the pretty-printer; still in the experimental phase (usability).
- Added new rules
padl
andpadr
(convenience). - Added example for quoted strings with arbitrary unicode characters (documentation).
- Changed rule
pad
to not suppress the padding in diagnostic messages (consistency).
- Cleaned up the source to compile with
-std=c++0x -pedantic
(compliance). - Cleaned out some superfluous compiler flags from the Makefile (minimalism).
- Changed the default compiler to
g++
, which can be overriden by$CXX
(consistency). - Cleaned up unittests for where
char
is signed but-fno-strict-overflow
is not given (compliance). - Removed
list/not_list/at_list/at_not_list
, butone/not_one/at_one/at_not_one
are now variadic (orthogonality). - Removed the redundant rules
space_star
,space_plus
,blank_star
, andblank_plus
(minimalism). - Added new rule class
list
(not to be confused with the old, very different, rulelist
) (convenience). - Changed class
seq
to invoke themarker
with a modifiedMust
flag for single-rule sequences (performance). - Changed rule class
until1
to be a specialisation ofuntil
, rather than have a different name (consistency). - Changed around the order of the template arguments of the
until
rule (consistency and flexibility). - Changed around the order of the template arguments of the
rep
rule and reduced to strict repeat (minimalism). - Changed many rule classes from one template argument to variadic sequence of arguments (flexibility).
- Changed the pretty-printing of rules, this is work in progress (aesthetics).
- Fixed the exception that occurred when
mmap()
ing an empty file (correctness).
- Added the missing
pegtl.hh
header file to the release archive...
- Cleanly layered implementation of
action_nth
(flexibility). - Renamed class
action_all
back toaction
(was better that way). - Moved main
pegtl.hh
include file out ofpegtl
directory (simplicity). - Renamed the rule method from
s_match
tomatch
(readability). - Renamed the action method from
matched
toapply
(readability). - Renamed the rule method from
s_insert
toprepare
(consistency). - Changed the input iterator classes to report byte offsets (consistency).
- Added rule and action class to match captured sub-expressions (experiment).
- Changed class
action
to invoke arbitrary many actions (succinctness). - Changed classes
ifmust
andifthen
to accept arbitrary many 'then' rules (succinctness). - Fixed potential dangling reference in helper class
names
(correctness).
- Added parser functions
parse_forward
for forward iterators (completeness). - Renamed parser functions for input iterators to
parse_input
(consistency). - Added parser functions
parse_file
for files, implemented withmmap(2)
(necessity). - Added initial support for customised logging of error messages (flexibility).
- Added support for ranges of input iterators with automatic minimal buffering (flexibility).
- Added class
action_nth
(flexibility). - Renamed class
action
toaction_all
(consistency). - Changed class
marker
to a nop when "must" is true (performance). - Changed
dummy_debug
to interpret "must" tracking (consistency). - Fixed typo in name of
PEGTL_IMPURE_OPTIMISATIONS
macro (correctness). - Made the marker class a sub-class of the input class (simplicity).
- Renamed some of classes named
white
,space
, orblank
(consistency). - Fixed some issues in the R6RS example (CFG to PEG mismatch, only first datum).
- Added missing template arguments to
smart_parse
-functions (correctness).
- Removed some small superfluous functions (less is more).
- Changed the "must" tracking from run-time to compile-time (better?).
- Optimised behaviour of
seq<>
andstring<>
(performance). - Added detection of division-by-zero to calculator example.
- Removed data source debug tracking from the library (simplicity).
- Removed run-time limits on rule applications and nesting (simplicity).
- Disentangled a couple of header files (maintainability).
- Renamed class
iterator_input
to forward_input (consistency). - Added class
string_input
to initialise forward_input from a string (convenience). - Removed template argument Rule to action functor's
matched()
method (simplicity).
- Added more wrapper functions for parsing (convenience).
- Renamed existing wrapper functions for parsing (consistency).
- Added
rewind()
method to classiterator_input
(indirect).
- Added more directory structure.
- Fixed compile-error in
sexpression.cc
(correctness).
- Fixed back-tracking in class
string
(correctness). - Fixed order of operands in calculator example (correctness).
- Added Scheme R6RS grammar (example).
- Fixed behaviour at end-of-input (aesthetics).
- Fixed behaviour and use of class
position
(correctness). - Changed to lazy initialisation of pretty-printer (performance).
- Changed the design of the input and parser classes (flexibility).
- Changed how expression rules provide their printer key (simplicity).
Released 2008
- First public release.
Development of the PEGTL started in November 2007 as an experiment in C++0x. It is based on ideas from the YARD library by Christopher Diggins.
Copyright (c) 2007-2020 Dr. Colin Hirsch and Daniel Frey