This is a regular expression engine implemented in Python that uses NFA and DFA and optionally minimizes DFA sets.
from regex import Regex
st = 'THISISREGEXTEST'
pattern = '([A-Z]*|[0-9]+)'
regex = Regex(st, pattern)
result = regex.match()
log(result)
Regex build parameter
(input_string, pattern_string, mode=1, minimize=True)
-
Finite state automata used
mode = 1 - NFA
mode = 2 - DFA
-
Whether to minimize if using DFA
see sample.py for details.
group ::= ("(" expr ")")*
expr ::= factor_conn ("|" factor_conn)*
factor_conn ::= factor | factor factor*
factor ::= (term | term ("*" | "+" | "?"))*
term ::= char | "[" char "-" char "]" | .
Implemented all the basic syntax
Tokens = {
'.': Token.ANY,
'^': Token.AT_BOL,
'$': Token.AT_EOL,
']': Token.CCL_END,
'[': Token.CCL_START,
'}': Token.CLOSE_CURLY,
')': Token.CLOSE_PAREN,
'*': Token.CLOSURE,
'-': Token.DASH,
'{': Token.OPEN_CURLY,
'(': Token.OPEN_PAREN,
'?': Token.OPTIONAL,
'|': Token.OR,
'+': Token.PLUS_CLOSE,
}
in lex.token
For the sake of simplicity, the code style of this implementation is not very good.
- Lexical analysis of regular expressions
- Definition of an NFA node the construction of an NFA
- Definition of an DFA node the construction of an DFA
- About parsing DFA or NFA based on input