-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating a "catch all" token #2017
Comments
Hello @knpwrs
Your approach: Greedy Matching and LookaheadMany of the regexp patterns you have tried seem to allow matching these two characters sequences (
I have not tested this so there may be other issues but I assume that if the regexp engine is greedy (attempts longest match) which is the default afaik. Then it would match the longest sub-string of the input that fits the pattern instead of the shortest string until the input. Using non-greedy quantifiers (
Suggestion (try this)My default approach in this case would be to not allow the pattern to match the two characters sequence
e.g: Edge CaseThere is still an edge case where the last token in the input is a "free Text" which ends with a single |
I wound up trying a custom token: export const Text = createToken({
name: 'Text',
line_breaks: true,
pattern: {
exec: (text, startOffset) => {
let endOffset = startOffset
let charCode = text.charCodeAt(endOffset)
let nextCharCode = text.charCodeAt(endOffset + 1)
while (
!Number.isNaN(charCode) &&
!Number.isNaN(nextCharCode) &&
charCode !== OpenBrace &&
nextCharCode !== OpenBrace &&
nextCharCode !== PercentSign
) {
endOffset += 1
charCode = text.charCodeAt(endOffset)
nextCharCode = text.charCodeAt(endOffset + 1)
}
if (endOffset === startOffset) {
return null
}
const match = text.substring(startOffset, endOffset)
return [match]
},
},
}) And I am very confused by this output: {
"errors": [
{
"column": 34,
"length": 1,
"line": 1,
"message": "unexpected character: ->
<- at offset: 33, skipped 1 characters.",
"offset": 33,
},
{
"column": 2,
"length": 1,
"line": 2,
"message": "unexpected character: -> <- at offset: 67, skipped 1 characters.",
"offset": 67,
},
{
"column": 13,
"length": 1,
"line": 2,
"message": "unexpected character: ->
<- at offset: 78, skipped 1 characters.",
"offset": 78,
},
{
"column": 26,
"length": 1,
"line": 2,
"message": "unexpected character: ->
<- at offset: 91, skipped 1 characters.",
"offset": 91,
},
],
"groups": {},
"tokens": [
{
"endColumn": 33,
"endLine": 1,
"endOffset": 32,
"image": "<!-- if array = [1,2,3,4,5,6] -->",
"startColumn": 1,
"startLine": 1,
"startOffset": 0,
"tokenType": {
"CATEGORIES": [],
"LINE_BREAKS": true,
"PATTERN": {
"exec": [Function],
},
"categoryMatches": [],
"categoryMatchesMap": {},
"isParent": false,
"name": "Text",
"tokenTypeIdx": 11,
},
"tokenTypeIdx": 11,
},
{
"endColumn": 36,
"endLine": 1,
"endOffset": 35,
"image": "{%",
"startColumn": 35,
"startLine": 1,
"startOffset": 34,
"tokenType": {
"CATEGORIES": [],
"PATTERN": /\\{%-\\?/,
"PUSH_MODE": "tag",
"categoryMatches": [],
"categoryMatchesMap": {},
"isParent": false,
"name": "TagStart",
"tokenTypeIdx": 10,
},
"tokenTypeIdx": 10,
}, Why would the line breaks be unexpected? I have I'm also thinking perhaps it would be beneficial for Chevrotain to ship an official Mustache Template Syntax lexer/parser. Mustache is the simplest language I'm aware of for this style of templates and it would demonstrate how to work around this problem for all similar languages. |
If you want your I suspect you may have a logical bug where your loop halts one index before the expected position, e.g:
You should also test the edge case of a <!-- if array = [1,2,3,4,5,6] -->
{% for item in array limit:2 %}
{{ item }}
{% endfor %}
123456 |
"Official" and "ship" are beyond the scope of the provided examples as most of those are non-productive But a smaller (more focused) example of "catch all" token example PR would be positively reviewed if you are interested in contributing it. |
My custom pattern wound up being problematic, so I used your suggested pattern and it's working well so far. I'd love to contribute an example, maybe after this project wraps and I gain some confidence in how it's all working together. Thank you for your help! |
I am using Chevrotain to try and make a lexer for the liquid templating language. Consider the following template (including the comment):
As a first step, I am making a multi-mode lexer. The first mode,
main
, has three tokens:Already you can see a problem with my
Text
token in that it will consume everything sinceObjectStart
andTagStart
don't match. Essentially I want to match everything up until either{{
opens a liquid object or{%
opens a liquid tag. I've tried/(?!{{|{%)+/
but this pattern matches empty strings./(.+)(?:{{|{%)?/
appears to work, but in every case, including/[\s\S]+/
, I am hitting something that I simply do not understand.My lexer returns the following errors:
The initial
<!--
does not match, and then the tokens start atif array
. Withpattern
set to/(.+)(?:{{|{%)?/
, I get the following errors:Basically every new line is unexpected.
I've also tried a variation on moving the
/(.+)(?:{{|{%)?/
pattern to the front of the mode, but that's producing errors of its own.What is the best way to create a "catch all" token that captures everything up until another token in the current mode would be valid?
Semantically, in a liquid template everything that is outside of an object (
{{ }}
) or a tag ({% %}
) is just text.EDIT: I've also tried
/([\s\S]+)(?:{{|{%)?/
and this appears to also produce the same errors as originally.The text was updated successfully, but these errors were encountered: