Skip to content

Commit

Permalink
Only allow punct delimiter for regex subpattern
Browse files Browse the repository at this point in the history
The experimental feature that allows wildcard subpatterns in finding
Unicode properties, is supposed to only allow ASCII punctuation for
delimitters.  But if you preceded the delimitter by a backslash, the
check was skipped.  This commit fixes that.

It may be that we will eventually want to loosen the restriction and
allow a wider range of delimiters.  But until we have valid use-cases
that would push us in that direction, I don't want to get into
supporting stuff that we might later regret, such as invisible
characters for delimitters.  This feature is not really required for
programs to work, so I don't view it as necessary to be as general as
possible.
  • Loading branch information
khwilliamson committed Dec 11, 2019
1 parent 11fcdeb commit cd9d511
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 6 deletions.
15 changes: 9 additions & 6 deletions regcomp.c
Original file line number Diff line number Diff line change
Expand Up @@ -23290,10 +23290,13 @@ Perl_parse_uniprop_string(pTHX_
/* Most punctuation after the equals indicates a subpattern, like
* \p{foo=/bar/} */
if ( isPUNCT_A(name[i])
&& name[i] != '-'
&& name[i] != '+'
&& name[i] != '_'
&& name[i] != '{')
&& name[i] != '-'
&& name[i] != '+'
&& name[i] != '_'
&& name[i] != '{'
/* A backslash means the real delimitter is the next character,
* but it must be punctuation */
&& (name[i] != '\\' || (i < name_len && isPUNCT_A(name[i+1]))))
{
/* Find the property. The table includes the equals sign, so we
* use 'j' as-is */
Expand All @@ -23309,8 +23312,8 @@ Perl_parse_uniprop_string(pTHX_
const char * pos_in_brackets;
bool escaped = 0;

/* A backslash means the real delimitter is the next character.
* */
/* Backslash => delimitter is the character following. We
* already checked that it is punctuation */
if (open == '\\') {
open = name[i++];
escaped = 1;
Expand Down
1 change: 1 addition & 0 deletions t/re/reg_mesg.t
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,7 @@ my @death =
'/\x{100}(?(/' => 'Unknown switch condition (?(...)) {#} m/\\x{100}(?({#}/', # [perl #133896]
'/(?[\N{KEYCAP DIGIT NINE}/' => '\N{} here is restricted to one character {#} m/(?[\\N{U+39.FE0F.20E3{#}}/', # [perl #133988]
'/0000000000000000[\N{U+0.00}0000/' => 'Unmatched [ {#} m/0000000000000000[{#}\N{U+0.00}0000/', # [perl #134059]
'/\p{nv=\b5\b}/' => 'Can\'t find Unicode property definition "nv=\\b5\\b" {#} m/\\p{nv=\\b5\\b}{#}/',
);

# These are messages that are death under 'use re "strict"', and may or may
Expand Down

0 comments on commit cd9d511

Please sign in to comment.