Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REPP tokenizer #43

Closed
goodmami opened this issue Dec 2, 2015 · 3 comments
Closed

REPP tokenizer #43

goodmami opened this issue Dec 2, 2015 · 3 comments
Milestone

Comments

@goodmami
Copy link
Member

goodmami commented Dec 2, 2015

Make a module that reads .repp files and tokenizes strings, without firing up a parser. See here: http://moin.delph-in.net/ReppTop

@alvations
Copy link

+1 Reading .repp files will be a good first step =)

From Rebecca's code, reading the LISP .rpp setting files seems to involve some boost regex compilation (boost::make_u32regex). Does anyone know whether they achieve the same regexes as python re.compile? More questions on http://stackoverflow.com/questions/34048609/converting-c-boost-regexes-to-python-re-regexes

@goodmami
Copy link
Member Author

goodmami commented Dec 3, 2015

The ReppTop page says the LKB uses a different regex engine, but both are Perl compatible (PCRE). Rather than Python's re module, the regex module seems like it has better Perl compatibility.

@goodmami goodmami modified the milestone: v0.5.0 Mar 7, 2016
@goodmami goodmami modified the milestones: v0.5.0, v0.6.0 Jun 3, 2016
@goodmami goodmami modified the milestones: v0.6.0, v0.7.0 Jan 19, 2017
@goodmami
Copy link
Member Author

Following up on engine choice: regex is more PCRE than re, but I am using re's internals (from the sre_parse module) to analyze the replacement patterns for characterization and string expansion purposes, so it's not as easy as just swapping out the engine.

Also note that there is python-pcre for actual PCRE, but it hasn't been updated in a while.

For now I will just use the regular re library and document its differences. Later I may try to minimally modify the patterns and replacement templates to be more compatible (e.g., changing \10 to \g<1>0 so it behaves like the C++ REPP).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants