-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor request: \v for vertical spacing #477
Comments
I can see that code suggest \v pseudo, but I cannot understand why it doesn't work then:
|
>>> '\v'
'\x0b'
>>> '\v' == '\N{LINE TABULATION}'
True |
Thanks for the prompt reply! Any ideas on the matching of vertical space? |
There are far fewer characters that need to match: Maybe it could be added as Also, I don't want to add something that the re module might do differently if it were added later. That's why it hasn't been added already. |
Okay, makes perfect sense (still sad for my downstream task).
…On Tue, Aug 16, 2022 at 5:18 PM mrabarnett ***@***.***> wrote:
There are far fewer characters that need to match:
[\x0A\x0B\x0C\x0D\x85\u2028\u2029] or [\x0A-x0D\x85\u2028\u2029].
Maybe it could be added as \V, although that would be inconsistent with \h,
and there are pairs of lowercase/uppercase escape codes where the uppercase
one is the negative of the lowercase one, e.g. \d and \D. On the other
hand, those implementations that have \h and \v don't have \H and \V.
Also, I don't want to add something that the re module might do
differently if it were added later.
That's why it hasn't been added already.
—
Reply to this email directly, view it on GitHub
<#477 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABAA4WMEYPZUNSUUHPGUF3VZOPMJANCNFSM56TVAIOQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I've come across a mention of |
Maybe something like a pseudo-character class, like [:blank:]?
…On Tue, Aug 16, 2022 at 9:39 PM mrabarnett ***@***.***> wrote:
I've come across a mention of \H and \V, so using \V would be a bad idea.
—
Reply to this email directly, view it on GitHub
<#477 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABAA4UITGW7KYVBY4FVE7DVZPN43ANCNFSM56TVAIOQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Now I'm thinking about I want the regex module to remain compatible with the re module, and just in case they ever get added there in the future, I'm soliciting opinions on python-dev. |
I've added |
Wow, many thanks! |
Well, you can probably add it to V1? It's already somewhat beyond the
original re :)
…On Tue, Aug 16, 2022 at 11:29 PM mrabarnett ***@***.***> wrote:
Now I'm thinking about \y and \Y, which look a little like \v and \V.
ProgressSQL uses them instead of \b and \B, which every other
implementation that I know of uses, possibly because \b normally
represents \x08 outside regex, and does still within characters classes.
I want the regex module to remain compatible with the re module, and just
in case they ever get added there in the future, I'm soliciting opinions on
python-dev.
—
Reply to this email directly, view it on GitHub
<#477 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABAA4VV4QEAN2LIUDUDSXTVZP2ZNANCNFSM56TVAIOQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Given the feedback on python-dev, I won't be adding |
Totally works for me!
Thanks a lot for looking into it again!
…On Tue, Oct 11, 2022 at 8:14 PM mrabarnett ***@***.***> wrote:
Given the feedback on python-dev, I won't be adding \y and \Y. What I've
already added should suffice.
—
Reply to this email directly, view it on GitHub
<#477 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABAA4VDSYGNLFNU37PSGS3WCWN7JANCNFSM56TVAIOQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi!
I'm using the regex lib to make a port of language tool libs (originally java) for sentence and word tokenization.
Those are relying on \v\h heavily. Some of those rules are shipped in the xml files full of regexes and I'm willing not to alter those to not to maintain a separate copy. I can kind of workaround it by replacing \v with VERTICAL_SPACE: str = "\u000a\u000b\u000c\u000d\u0085\u2028\u2029" but it's another tiny nightmare, as those regexes can come in different fashions: \v*, [\v\t]*, etc.
Please review the possibility to add the \v flag
The text was updated successfully, but these errors were encountered: