Ignore case new syntax #1002

KvanTTT · 2015-09-21T11:03:51Z

What do you think about new lexer command for case ignoring?

Procedure: 'procedure' -> ignoreCase;

Or to use regex-like syntax (which gives more flexibility, for example ignore part of word):

Procedure: '\iprocedure\-i';
Procedure: '\i' 'procedure' '\-i';

Now I have to use the following not pretty syntax:

Procedure: P R O C E D U R E;
fragment A: [aA];
fragment B: [bB];
fragment C: [cC];
fragment D: [dD];
fragment E: [eE];
fragment F: [fF];
fragment G: [gG];
fragment H: [hH];
fragment I: [iI];
fragment J: [jJ];
fragment K: [kK];
fragment L: [lL];
fragment M: [mM];
fragment N: [nN];
fragment O: [oO];
fragment P: [pP];
fragment Q: [qQ];
fragment R: [rR];
fragment S: [sS];
fragment T: [tT];
fragment U: [uU];
fragment V: [vV];
fragment W: [wW];
fragment X: [xX];
fragment Y: [yY];
fragment Z: [zZ];

This feature especially important for case-insensivity languages, such as Pascal, PHP, etc.

jimidle · 2015-09-21T11:21:39Z

Don't create all to fragment rule just override LA in the input stream and always return toLower.
Jim

On Mon, Sep 21, 2015 at 4:04 AM -0700, "Ivan Kochurkin" notifications@github.com wrote:

What do you think about new lexer command for case ignoring?

Procedure: 'procedure' -> ignoreCase;

Or to use regex-like syntax (which gives more flexibility, for example ignore part of word):

Procedure: '\iprocedure-i';
Procedure: '\i' 'procedure' '-i';

Now I have to use the following syntax:

Procedure: P R O C E D U R E;
fragment A: [aA];
fragment B: [bB];
fragment C: [cC];
fragment D: [dD];
fragment E: [eE];
fragment F: [fF];
fragment G: [gG];
fragment H: [hH];
fragment I: [iI];
fragment J: [jJ];
fragment K: [kK];
fragment L: [lL];
fragment M: [mM];
fragment N: [nN];
fragment O: [oO];
fragment P: [pP];
fragment Q: [qQ];
fragment R: [rR];
fragment S: [sS];
fragment T: [tT];
fragment U: [uU];
fragment V: [vV];
fragment W: [wW];
fragment X: [xX];
fragment Y: [yY];
fragment Z: [zZ];

This feature especially important for case-insensivity languages, such as Pascal, PHP, etc.

—
Reply to this email directly or view it on GitHub.

KvanTTT · 2015-09-21T11:37:09Z

Don't create all to fragment rule just override LA in the input stream and always return toLower.

@jimidle. Firstly, your solution runtime-dependent. Secondly, parsed language can be case-insensivity not for all parts (php).

There is such solution: https://github.com/developeron29/PLSQLParser/blob/master/PLSQL.g4#L534
But it is ugly, runtime-dependent (java) and, as I guess, it have not an optimal performance.

For example, the following tokens have an equal beginning, so it can be optimized.

DO
    : D O
    ;

DOWNTO
    : D O W N T O
    ;

jimidle · 2015-09-21T12:22:11Z

It is faster than our fragment rules and as you specify only lower case letters in the lever rules it is better optimizes. I came up with original method and did the performance test. Regular expressions won't be implemented. So stick with your current method if you feel that overriding a single method is too much trouble.

On Sep 21, 2015, at 19:37, Ivan Kochurkin notifications@github.com wrote:

Don't create all to fragment rule just override LA in the input stream and always return toLower.
@jimidle. Firstly, your solution runtime-dependent. Secondly, parsed language can be case-insensivity not for all parts (php).

There is such solution: https://github.com/developeron29/PLSQLParser/blob/master/PLSQL.g4#L534
But it is ugly, runtime-dependent (java) and, as I guess, it have not an optimal performance.

For example, the following tokens have an equal beginning, so it can be optimized.

DO
: D O
;

DOWNTO
: D O W N T O
;
—
Reply to this email directly or view it on GitHub.

KvanTTT · 2015-09-25T22:30:18Z

@jimidle , could you please show the grammar sample with your view of case insensitive tokens?

jimidle · 2015-09-26T08:02:02Z

I believe I have already put this in the public domain but of course your own needs may be different.
Happy to share but here in Taiwan it is a public holiday so I will look again on Tuesday.

On Fri, Sep 25, 2015 at 3:30 PM -0700, "Ivan Kochurkin" notifications@github.com wrote:

@jimidle , could you please show the grammar sample with your view of case insensitive tokens?

—
Reply to this email directly or view it on GitHub.

KvanTTT · 2015-10-13T21:14:55Z

Actually ignoreCase is not a lexer command, it is an option to perceive chars insensitive to case. So I would go for the following syntax (with respect to current ANTLR grammar):

Procedure: 'procedure' <ignoreCase=true>;

Or even easier:

Procedure: 'procedure' <ignoreCase>;

Here is described that "there is no ANTLR option that enables case insensitivity, as this is hard or impossible to do completely correctly, taking into account all possible internationalization issues", but in most cases not english chars are rarely used. Moreover it is possible to use warning "Lexer rule with not english chars and ignoreCase option may produce incorrect output". And for most cases performance with fragment rules is good. So, I think this option have the right to existence.

yuriry · 2017-02-21T03:10:08Z

I like <ignoreCase> option. Not as convenient as JavaCC' TOKEN[IGNORE_CASE] where multiple tokens can be marked as case-insensitive, but way better than various alternatives I've seen so far.

KvanTTT · 2017-02-21T10:17:59Z

Yes, the option is useful. But this issue a bit outdated. It's not possible to use option for a single token, but possible to use it for mode and entire grammar. See PR with case insensitive proof of concept: #1092. Also we should use something like @sharwell CaseInsensitiveInputStream.java instead of my approach with fragment tokens.

yuriry · 2017-02-23T16:25:26Z

It seems that none of these options are available in the 4.4.6 release :(

stefv · 2017-07-28T08:32:41Z

I'm agree, it's a sad news because this feature can be usefull in the grammar to test immediatly with the tools without to create a program with a spacial input stream.

KvanTTT · 2021-12-29T00:07:36Z

Eventually fixed by #3399 and #3437

KvanTTT changed the title ~~Ignore case new syntax~~ Ignore case new syntax type:feature Oct 6, 2015

KvanTTT changed the title ~~Ignore case new syntax type:feature~~ Ignore case new syntax Oct 6, 2015

KvanTTT mentioned this issue Jan 8, 2016

[RFC] Case Insensitivity Proof of Concept #1092

Closed

KvanTTT mentioned this issue Nov 22, 2016

[RFC] Add channel command for modes #1390

Closed

KvanTTT mentioned this issue Sep 28, 2017

New case insensitive options antlr/antlr4test-maven-plugin#1

Closed

bramp mentioned this issue Oct 6, 2017

Add a new CharStream that converts the symbols to upper or lower case. #2046

Closed

carlspring mentioned this issue Jun 16, 2018

AQL case insensetive keywords strongbox/strongbox#721

Closed

fixmebot bot referenced this issue in VectorXz/elasticsearch Apr 22, 2021

Create TestFixMe.md

a9fae03

fixmebot bot referenced this issue in VectorXz/elasticsearch May 28, 2021

Create Helloworld.md

1398a04

fixmebot bot referenced this issue in VectorXz/elasticsearch Aug 4, 2021

Update Helloworld.md

f68abab

KvanTTT closed this as completed Dec 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore case new syntax #1002

Ignore case new syntax #1002

KvanTTT commented Sep 21, 2015

jimidle commented Sep 21, 2015

KvanTTT commented Sep 21, 2015

jimidle commented Sep 21, 2015

KvanTTT commented Sep 25, 2015

jimidle commented Sep 26, 2015

KvanTTT commented Oct 13, 2015

yuriry commented Feb 21, 2017 •

edited

Loading

KvanTTT commented Feb 21, 2017 •

edited

Loading

yuriry commented Feb 23, 2017

stefv commented Jul 28, 2017

KvanTTT commented Dec 29, 2021

Ignore case new syntax #1002

Ignore case new syntax #1002

Comments

KvanTTT commented Sep 21, 2015

jimidle commented Sep 21, 2015

KvanTTT commented Sep 21, 2015

jimidle commented Sep 21, 2015

KvanTTT commented Sep 25, 2015

jimidle commented Sep 26, 2015

KvanTTT commented Oct 13, 2015

yuriry commented Feb 21, 2017 • edited Loading

KvanTTT commented Feb 21, 2017 • edited Loading

yuriry commented Feb 23, 2017

stefv commented Jul 28, 2017

KvanTTT commented Dec 29, 2021

yuriry commented Feb 21, 2017 •

edited

Loading

KvanTTT commented Feb 21, 2017 •

edited

Loading