A python implementation of a lexical analyzer which supports full scan, state based lexing and lookahead
Warning This is not a GENERATOR like classical lexer is. It does not produce any python code. It's a simple plain scanner of the given input string and tokenizer into given set of tokens by matching regular expressions. Thus, at runtime you can change the token definition and use one same code for any token set
Install in your project with pip:
pip install pylexer
An example use:
from pylexer import PyLexer
config = {
'\\s' :'',
'\\d+' :'number',
'\\+' : 'plus',
'-': 'minus',
'\\*' : 'mul',
'/' : 'div',
}
#Static Scan method that returns list of tokens
tokens = PyLexer.scan(config, '2 + 3')
map(lambda x:x.get_name(), tokens)
#PyLexer Config is a dict, so you can also use it like
lexer = PyLexer()
lexer.set_input('2 + 3')
lexer.move_next()
while lexer.get_look_ahead():
print(lexer.get_look_ahead().get_name())
lexer.move_next()
Tokens are defined with TokenDefinition
class that holds token name and regular expression. Token name can be empty, and in that case lexer will ignore/skip such tokens
The lexer configuration holds a list of all token definitions. With LexerDictConfig it can be easily created from an array where keys are regular expressions and values are names of tokens
Pylexer's static scan method can be used to scan given input string and returns a list of tokens, Pylexer can also be used to walk through scanned tokens with single look ahead
MIT license. See LICENSE.md
for more information.
Pylexer is inspired from PHP's Lexer(https://github.com/tmilos/lexer) and takes code heavily from doctrine API, all credits due with Milos Tomic