-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(*THEN) broken inside condition subpattern #11443
Comments
From ph10@hermes.cam.ac.ukCreated by ph10@cam.ac.ukIt seems to me that, if what precedes (*THEN) in a branch matches Pattern: /^.*?(?(?=a)a|b(*THEN)c)/ Pattern: /^.*?(?(?=a)a|bc)/ I noticed this because I have just fixed the same bug in PCRE. Philip Hazel Perl Info
|
From @cpansproutOn Wed Jun 15 11:26:50 2011, ph10@hermes.cam.ac.uk wrote:
Do the pipes in the (?(...)...) condition expression count as regular |
The RT System itself - Status changed from 'new' to 'open' |
From ph10@hermes.cam.ac.ukOn Sun, 11 Sep 2011, Father Chrysostomos via RT wrote:
The documentation for THEN says that it tries "the next alternation in So yes, I guess it all depends on whether or not the branches of a While thinking about this and experimenting, I've just discovered Pattern: /a+?(*THEN)c/ However, PCRE matches only "ac". The same thing happens with (*PRUNE). Regards, -- |
From @cpansproutOn Mon Sep 12 06:24:25 2011, ph10@hermes.cam.ac.uk wrote:
So you are saying that (?(condition)foo(*THEN)bar|baz) should jump out I think it ends up being too confusing. The | in a conditional has $ perl -le' /(?(1)foo|bar|baz)/'
That’s strange. In 5.14 it doesn’t match. I don’t know which is worse.
|
From ph10@hermes.cam.ac.ukOn Sun, 18 Sep 2011, Father Chrysostomos via RT wrote:
No, I'm not.
That is certainly true, but to me, as a simple-minded person, it *looks* /^.*?(?(?=a)a(*THEN)b|c)/ Pattern I think the same should happen for this example: /^.*?(?(?=a)a(*THEN)b)c/ Further investigation shows up another issue. If (*THEN) appears in a /^.*?(a(*THEN)b)c/ Perl gives "no match"; PCRE currently matches. However, if we give it /^.*?(a(*THEN)b|z)c/ then Perl (5.012003) does match. That seems very counter-intuitive to The text in perlre for *THEN says "when backtracked into on failure, it /^.*?(z|a(*THEN)b)c/ shows that Perl does match in this case too.
I sometimes wonder whether these new backtracking verbs are going to Regards, -- |
From @nwc10On Mon, Sep 19, 2011 at 05:49:19PM +0100, Philip Hazel wrote:
About the only possibly useful input I think I can have after reading =item C<(*THEN)> C<(*THEN:NAME)> is that I'm sadly thinking that you're right about the trouble/worth trade. I don't actually understand any of this. Which isn't a good sign, as based [Nothing uses (*THEN) in the core, other than the 14 lines of tests for it] Nicholas Clark |
From @ilmariNicholas Clark <nick@ccl4.org> writes:
Nor on CPAN, other than various re::engine:: tests. <http://grep.cpan.me/?q=%5C%28%5C*THEN%5C%29> -- |
From ph10@hermes.cam.ac.ukOn Tue, 20 Sep 2011, Nicholas Clark wrote:
<snip>
I've been thinking about this some more. My naive understanding of *THEN A (*THEN) B (?>A)B If you go along with this, it follows that, if (*THEN) is within a C (A (*THEN) B) C ((?>A) B) Now, it seems that Perl thinks differently to me. There seems to be the A (B (*THEN) C) In the first, Perl fails the match without any backtracking if C fails; Regards, PS On the matter of value/worth, some of the other backtracking verbs Pattern: (*ACCEPT)a Pattern: (*ACCEPT) This is Perl 5.012003. -- |
From @ap* Philip Hazel <ph10@hermes.cam.ac.uk> [2011-09-21 11:20]:
That is how I understand it as well. In which case the emphasis on alternation is not entirely a red It occurs to me that “THEN” is really the wrong thing to call A(*THEN)B translates to something like “match A as long as B also matches Regards, |
From @cpansproutOn Wed Sep 21 02:17:32 2011, ph10@hermes.cam.ac.uk wrote:
Oddly, I don’t find that counterintuitive at all. Do we need three |
From ph10@hermes.cam.ac.ukOn Wed, 21 Sep 2011, Father Chrysostomos via RT wrote:
Aha! One person's intuition is always another's totally craziness. :-) Is there a forum where we could ask the following question? Folks, consider the pattern ^A(B(*THEN)C), where A, B, and C are I will ask this question on the pcre-dev mailing list and see what It seems that you intuitively think that a group without a | is a I can, however, understand your logic; I have to say that to me it seems I'd rather not created yet another version of prune/then! I *thought* I *THEN fails the current alternation branch, and restarts at the next *PRUNE fails the current match, but allows an advance to the next *SKIP is like *PRUNE, but can skip forward more than one character. *COMMIT fails the entire matching process, not allowing any further That's fairly straightforward; the issue between us is what constitutes Regards, -- |
From @cpansproutOn Wed Sep 21 12:04:45 2011, ph10@hermes.cam.ac.uk wrote:
Come to think of it, we already have your version of (*THEN), as So your ^A(B(*THEN)C) translates into ^A((?>B)C). My understanding of (*THEN) (and perl’s implementation), would be [?>^A(B]C) where I’m using [?>...], as it doesn’t nest. So the latter meaning
|
From ph10@hermes.cam.ac.ukOn Wed, 21 Sep 2011, Father Chrysostomos via RT wrote:
Sure ... but Perl has never shied away from having "more than one way to You will be please to hear that Jeff Friedl, who has only just At a practical level, I am not sure it is feasible to change PCRE to Thanks for picking this up and having this discussion. Regards, -- |
From @cpansproutOn Fri Sep 23 02:15:05 2011, ph10@hermes.cam.ac.uk wrote:
I wouldn’t say it’s invalid, as Perl’s documentation is a little
|
From @cpansproutOn Sun Sep 18 13:33:28 2011, sprout wrote:
The change occurred with this commit: commit d1c771f VERB nodes in the regex engine should NOT be marked as JUMPABLE. Inline Patchdiff --git a/regexec.c b/regexec.c
index 35ef8d4..ec4c4b0 100644
--- a/regexec.c
+++ b/regexec.c
@@ -252,7 +252,8 @@
OP(rn) == EVAL || \
OP(rn) == SUSPEND || OP(rn) == IFMATCH || \
OP(rn) == PLUS || OP(rn) == MINMOD || \
- OP(rn) == KEEPS || (PL_regkind[OP(rn)] == VERB) || \
+ OP(rn) == KEEPS || \
+ /*(PL_regkind[OP(rn)] == VERB && OP(rn) != PRUNE && OP(rn) !=
|
From ph10@hermes.cam.ac.ukOn Fri, 23 Sep 2011, Father Chrysostomos via RT wrote:
Thanks. No doubt that will eventually make it into a release and I'll see it. Regards, -- |
From @ikegamiOn Sat, Sep 24, 2011 at 9:41 AM, Father Chrysostomos via RT <
According to the 5.14.1 docs, it shouldn't match. Note that if [the (*PRUNE)] operator is used and NOT inside of an Consider the pattern A (*PRUNE) B, where A and B are complex patterns. Until The following should match according to the docs: /a+(*THEN)c/ /a+?(?=c)(*THEN)c/ |
From @ikegamiOn Mon, Sep 26, 2011 at 11:03 PM, Eric Brine <ikegami@adaelis.com> wrote:
Oops, that should be (*NEXT) at the beginning. |
From @ikegamiOn Mon, Sep 26, 2011 at 11:04 PM, Eric Brine <ikegami@adaelis.com> wrote:
ARGH! (*THEN) |
From @cpansproutOn Mon Sep 26 20:04:14 2011, ikegami@adaelis.com wrote:
*at the current starting position* (*THEN) acts like (*COMMIT) in the example given above. |
From @ikegamiOn Tue, Sep 27, 2011 at 1:22 AM, Father Chrysostomos via RT <
How can you check at what position it failed? I don't even know what that |
From @ikegamiOn Tue, Sep 27, 2011 at 2:44 PM, Eric Brine <ikegami@adaelis.com> wrote:
Ok, I understand what the docs are *trying* to convery. The docs aren't just unclear, they are self-contradictory. In two places, - Failing outright is exactly the opposite the failing for only one start |
From ph10@hermes.cam.ac.ukOn Mon, 26 Sep 2011, Father Chrysostomos via RT wrote:
Aha. Since we are quoting "man perlre", here's another bit of confusion: "(?PARNO)" "(?-PARNO)" "(?+PARNO)" "(?R)" "(?0)" Consider this pattern (ignore white space): ^.*? (?1) c (?(DEFINE) (a(*THEN)b) ) As it happens, PCRE processes recursions/subroutines differently to However, I'm still thinking about this whole issue with regard to PCRE Regards, -- |
From @demerphqSorry about the incredibly laggy reply. :-( FC recently brought to my attention these threads, which I had managed I am doing my best to work my way through them and will reply as I go. On 20 September 2011 18:32, Philip Hazel <ph10@hermes.cam.ac.uk> wrote:
I think you are right that perl thinks differently to you. Probably Perl's regex engine doesn't have a concept of a "group". It has a There is no group parens in the pattern /foo|bar|baz/ however there is The only part the group parens play is to tell the parser where the You could say that (?: | ) are some kind of really complicated ternary Ditto for the association of groups to quantifiers. In reality the So, when we have this: (?:foo|bar)* the (?: ) are part of the | operator in that they denote the beginning Another aspect of this is that to perl /a(?:foo)b/ is the same as So thinking of a (?: ... ) as a "thing" is wrong from the Perl IOW, at an implementation level it is not that case that writing Consider this output: $ perl -Mre=debug -e'/[ac]|[bc]/' $ perl -Mre=debug -e'/(?:[ac]|[bc])/' The program generated is the same. There is no operator for GROUP, (*THEN) affects how we transition from BRANCH to BRANCH, and if we This gets hard to debug in Perl if you dont use charclasses with at Anyway, I can see how it would be reasonable to consider (?:foo) to be The intent of the (*THEN) operator is for something like this: / 1111 ( (?:101)+ (*THEN) 000 | (?: 110 | 011 )+ (*THEN) 000 ) 0101 /x If we match say 1000 '101's, and we cannot match '000' no amount of Here is an example of how this can help. wc -l is useful but crude $ for i in 1 2 4 8 16 32 64 128 256; do perl -Mre=Debug,EXECUTE The behavior of /A+? (*THEN) BC / appeared to be broken in some cases. $ ./perl -le'"aaabc"= The (?{}) disables this optimization, and make it do the expected thing. Your original bug report was about this: ./perl -Ilib -Mre=Debug,All,FLAGS -le'"ba"=~/^.*?(?(?=a)a|b(*THEN)c)/ which does not match, probably because of an optimisation in .*? handling. An unanchored match /P/ should match the same thing as /^.*?P/, and ./perl -le'"ba"=~/((?(?=a)a|b(*THEN)c))/ and print "match: $&\ncond: $1"' And if we removed the non-greedy quantifier modifier: ./perl -le'"ba"=~/^.*((?(?=a)a|b(*THEN)c))/ and print "match: $&\ncond: $1"' Both of which match as expected, however this still doesnt match: ./perl -le'"ba"=~/^.*?((?(?=a)a|b(*THEN)c))/ and print "match: $&\ncond: $1"' which I continue to investigate. I have pushed b8f6efd to fix the case I also pushed 337ff30 which shows the intflags. You need to use -Mre=Debug,All,FLAGS to see the flags. I might change Yves |
From ph10@hermes.cam.ac.ukOn Sat, 22 Jun 2013, demerphq wrote:
Hi Yves, No problem; glad somebody has picked this up. For myself, I can no
Thanks for taking the time to give a long explanation of the way Perl In a recent release of PCRE we have formulated some reasonably
PCRE does now get this (perhaps not 100%, but to some extent) "right" in Regards, -- |
From @demerphqOn 23 June 2013 13:18, <ph10@hermes.cam.ac.uk> wrote:
FC pointed out a list of VERB issues and I am trying to work through Definitely some of things that have been reported have been bugs. A
Indeed. Both projects have a very long history and some things are
Id like to hear about the differences. This thread makes me wonder how
Cool. I will try to find time to read the PCRE docs and look for Cheers, -- |
From ph10@hermes.cam.ac.ukOn Sun, 23 Jun 2013, demerphq wrote:
This is perhaps the relevant section of the "pcrepattern" man page: More than one backtracking verb If more than one backtracking verb is present in a pattern, the one (A(*COMMIT)B(*THEN)C|ABD) If A matches but B fails, the backtrack to (*COMMIT) causes the entire ...(*COMMIT)(*PRUNE)... If there is a matching failure to the right, backtracking onto (*PRUNE) Backtracking verbs in repeated groups PCRE differs from Perl in its handling of backtracking verbs in /(a(*COMMIT)b)+ac/ If the subject is "abac", Perl matches, but PCRE fails because the Backtracking verbs in assertions (*FAIL) in an assertion has its normal effect: it forces an immediate (*ACCEPT) in a positive assertion causes the assertion to succeed with- The other backtracking verbs are not treated specially if they appear Negative assertions are, however, different, in order to ensure that Backtracking verbs in subroutines These behaviours occur whether or not the subpattern is called recur- (*FAIL) in a subpattern called as a subroutine has its normal effect: (*ACCEPT) in a subpattern called as a subroutine causes the subroutine (*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine (*THEN) skips to the next alternative in the innermost enclosing group Regards, -- |
@demerphq ping |
Migrated from rt.perl.org#92898 (status was 'open')
Searchable as RT92898$
The text was updated successfully, but these errors were encountered: