Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore case new syntax #1002

Closed
KvanTTT opened this issue Sep 21, 2015 · 11 comments
Closed

Ignore case new syntax #1002

KvanTTT opened this issue Sep 21, 2015 · 11 comments

Comments

@KvanTTT
Copy link
Member

KvanTTT commented Sep 21, 2015

What do you think about new lexer command for case ignoring?

Procedure: 'procedure' -> ignoreCase;

Or to use regex-like syntax (which gives more flexibility, for example ignore part of word):

Procedure: '\iprocedure\-i';
Procedure: '\i' 'procedure' '\-i';

Now I have to use the following not pretty syntax:

Procedure: P R O C E D U R E;
fragment A: [aA];
fragment B: [bB];
fragment C: [cC];
fragment D: [dD];
fragment E: [eE];
fragment F: [fF];
fragment G: [gG];
fragment H: [hH];
fragment I: [iI];
fragment J: [jJ];
fragment K: [kK];
fragment L: [lL];
fragment M: [mM];
fragment N: [nN];
fragment O: [oO];
fragment P: [pP];
fragment Q: [qQ];
fragment R: [rR];
fragment S: [sS];
fragment T: [tT];
fragment U: [uU];
fragment V: [vV];
fragment W: [wW];
fragment X: [xX];
fragment Y: [yY];
fragment Z: [zZ];

This feature especially important for case-insensivity languages, such as Pascal, PHP, etc.

@jimidle
Copy link
Collaborator

jimidle commented Sep 21, 2015

Don't create all to fragment rule just override LA in the input stream and always return toLower. 
Jim

On Mon, Sep 21, 2015 at 4:04 AM -0700, "Ivan Kochurkin" notifications@github.com wrote:

What do you think about new lexer command for case ignoring?

Procedure: 'procedure' -> ignoreCase;

Or to use regex-like syntax (which gives more flexibility, for example ignore part of word):

Procedure: '\iprocedure-i';
Procedure: '\i' 'procedure' '-i';

Now I have to use the following syntax:

Procedure: P R O C E D U R E;
fragment A: [aA];
fragment B: [bB];
fragment C: [cC];
fragment D: [dD];
fragment E: [eE];
fragment F: [fF];
fragment G: [gG];
fragment H: [hH];
fragment I: [iI];
fragment J: [jJ];
fragment K: [kK];
fragment L: [lL];
fragment M: [mM];
fragment N: [nN];
fragment O: [oO];
fragment P: [pP];
fragment Q: [qQ];
fragment R: [rR];
fragment S: [sS];
fragment T: [tT];
fragment U: [uU];
fragment V: [vV];
fragment W: [wW];
fragment X: [xX];
fragment Y: [yY];
fragment Z: [zZ];

This feature especially important for case-insensivity languages, such as Pascal, PHP, etc.


Reply to this email directly or view it on GitHub.

@KvanTTT
Copy link
Member Author

KvanTTT commented Sep 21, 2015

Don't create all to fragment rule just override LA in the input stream and always return toLower.

@jimidle. Firstly, your solution runtime-dependent. Secondly, parsed language can be case-insensivity not for all parts (php).

There is such solution: https://github.com/developeron29/PLSQLParser/blob/master/PLSQL.g4#L534
But it is ugly, runtime-dependent (java) and, as I guess, it have not an optimal performance.

For example, the following tokens have an equal beginning, so it can be optimized.

DO
    : D O
    ;

DOWNTO
    : D O W N T O
    ;

@jimidle
Copy link
Collaborator

jimidle commented Sep 21, 2015

It is faster than our fragment rules and as you specify only lower case letters in the lever rules it is better optimizes. I came up with original method and did the performance test. Regular expressions won't be implemented. So stick with your current method if you feel that overriding a single method is too much trouble.

On Sep 21, 2015, at 19:37, Ivan Kochurkin notifications@github.com wrote:

Don't create all to fragment rule just override LA in the input stream and always return toLower.
@jimidle. Firstly, your solution runtime-dependent. Secondly, parsed language can be case-insensivity not for all parts (php).

There is such solution: https://github.com/developeron29/PLSQLParser/blob/master/PLSQL.g4#L534
But it is ugly, runtime-dependent (java) and, as I guess, it have not an optimal performance.

For example, the following tokens have an equal beginning, so it can be optimized.

DO
: D O
;

DOWNTO
: D O W N T O
;

Reply to this email directly or view it on GitHub.

@KvanTTT
Copy link
Member Author

KvanTTT commented Sep 25, 2015

@jimidle , could you please show the grammar sample with your view of case insensitive tokens?

@jimidle
Copy link
Collaborator

jimidle commented Sep 26, 2015

I believe I have already put this in the public domain but of course your own needs may be different. 
Happy to share but here in Taiwan it is a public holiday so I will look again on Tuesday. 

On Fri, Sep 25, 2015 at 3:30 PM -0700, "Ivan Kochurkin" notifications@github.com wrote:

@jimidle , could you please show the grammar sample with your view of case insensitive tokens?


Reply to this email directly or view it on GitHub.

@KvanTTT KvanTTT changed the title Ignore case new syntax Ignore case new syntax type:feature Oct 6, 2015
@KvanTTT KvanTTT changed the title Ignore case new syntax type:feature Ignore case new syntax Oct 6, 2015
@KvanTTT
Copy link
Member Author

KvanTTT commented Oct 13, 2015

Actually ignoreCase is not a lexer command, it is an option to perceive chars insensitive to case. So I would go for the following syntax (with respect to current ANTLR grammar):

Procedure: 'procedure' <ignoreCase=true>;

Or even easier:

Procedure: 'procedure' <ignoreCase>;

Here is described that "there is no ANTLR option that enables case insensitivity, as this is hard or impossible to do completely correctly, taking into account all possible internationalization issues", but in most cases not english chars are rarely used. Moreover it is possible to use warning "Lexer rule with not english chars and ignoreCase option may produce incorrect output". And for most cases performance with fragment rules is good. So, I think this option have the right to existence.

@yuriry
Copy link

yuriry commented Feb 21, 2017

I like <ignoreCase> option. Not as convenient as JavaCC' TOKEN[IGNORE_CASE] where multiple tokens can be marked as case-insensitive, but way better than various alternatives I've seen so far.

@KvanTTT
Copy link
Member Author

KvanTTT commented Feb 21, 2017

Yes, the option is useful. But this issue a bit outdated. It's not possible to use option for a single token, but possible to use it for mode and entire grammar. See PR with case insensitive proof of concept: #1092. Also we should use something like @sharwell CaseInsensitiveInputStream.java instead of my approach with fragment tokens.

@yuriry
Copy link

yuriry commented Feb 23, 2017

It seems that none of these options are available in the 4.4.6 release :(

@stefv
Copy link

stefv commented Jul 28, 2017

I'm agree, it's a sad news because this feature can be usefull in the grammar to test immediatly with the tools without to create a program with a spacial input stream.

@KvanTTT
Copy link
Member Author

KvanTTT commented Dec 29, 2021

Eventually fixed by #3399 and #3437

@KvanTTT KvanTTT closed this as completed Dec 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants