Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected return values for multiple regexps in a single statement #19058

Closed
rapilodev opened this issue Aug 17, 2021 · 13 comments
Closed

Unexpected return values for multiple regexps in a single statement #19058

rapilodev opened this issue Aug 17, 2021 · 13 comments

Comments

@rapilodev
Copy link

Description
When combining capture group values $1 from multiple regexps,
$1 returns the contents of the last regexp for all regexps.

Steps to Reproduce
perl -e 'use strict; use warnings; my $c = ("a" =~ m/(a)/ && $1) . ("b" =~ m/(b)/ && $1); print "$c\n";'
Output: bb

Expected behavior
The first regexp should return "a" because the string "a" contains the character "a".
The second regexp should return "b" because the string "b" contains the character "b".
The stringing together of the values should result in "ab" and not in "bb".

It will work if you force the copying of the values of "$1"
perl -e 'use strict; use warnings; my $c = ("a" =~ m/(a)/ && "$1") . ("b" =~ m/(b)/ && "$1") ; print "$c\n";'
output: ab

If you capture the values directly without access through $1, it will work, too. Here join is used to force the context of the list.
perl -e 'use strict; use warnings; my $c = join("", ("a" =~ m/(a)/) , ("b" =~ m/(b)/) ) ; print "$c\n";'
output: ab

Perl configuration

  Platform:
    osname=linux
    osvers=4.19.0
    archname=x86_64-linux-gnu-thread-multi
    uname='linux localhost 4.19.0 #1 smp debian 4.19.0 x86_64 gnulinux '
    config_args='-Dmksymlinks -Dusethreads -Duselargefiles -Dcc=x86_64-linux-gnu-gcc -Dcpp=x86_64-linux-gnu-cpp -Dld=x86_64-linux-gnu-gcc -Dccflags=-DDEBIAN -Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -ffile-prefix-map=/dummy/build/dir=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -Dldflags= -Wl,-Bsymbolic-functions -flto=auto -Wl,-z,relro -Dlddlflags=-shared -Wl,-Bsymbolic-functions -flto=auto -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.32 -Darchlib=/usr/lib/x86_64-linux-gnu/perl/5.32 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/x86_64-linux-gnu/perl5/5.32 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.32.1 -Dsitearch=/usr/local/lib/x86_64-linux-gnu/perl/5.32.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Ui_xlocale -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -dEs -Duseshrplib -Dlibperl=libperl.so.5.32.1'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='x86_64-linux-gnu-gcc'
    ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-O2 -g'
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion=''
    gccversion='10.3.0'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='x86_64-linux-gnu-gcc'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /lib64 /usr/lib64
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=libc-2.33.so
    so=so
    useshrplib=true
    libperl=libperl.so.5.32
    gnulibc_version='2.33'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -L/usr/local/lib -fstack-protector-strong'


Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_TIMES
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_IMPLICIT_CONTEXT
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
    USE_REENTRANT_API
    USE_THREAD_SAFE_LOCALE
  Locally applied patches:
    DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
    DEBPKG:debian/db_file_ver - https://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
    DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
    DEBPKG:debian/enc2xs_inc - https://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories.
    DEBPKG:debian/errno_ver - https://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
    DEBPKG:debian/libperl_embed_doc - https://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking
    DEBPKG:fixes/respect_umask - Respect umask during installation
    DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories
    DEBPKG:debian/extutils_set_libperl_path - EU:MM: set location of libperl.a under /usr/lib
    DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor
    DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
    DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
    DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
    DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
    DEBPKG:debian/perlivp - https://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
    DEBPKG:debian/squelch-locale-warnings - https://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
    DEBPKG:debian/patchlevel - https://bugs.debian.org/567489 List packaged patches for 5.32.1-3ubuntu2.1 in patchlevel.h
    DEBPKG:fixes/document_makemaker_ccflags - https://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags}
    DEBPKG:debian/find_html2text - https://bugs.debian.org/640479 Configure CPAN::Distribution with correct name of html2text
    DEBPKG:debian/perl5db-x-terminal-emulator.patch - https://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl
    DEBPKG:debian/cpan-missing-site-dirs - https://bugs.debian.org/688842 Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable
    DEBPKG:fixes/memoize_storable_nstore - [rt.cpan.org #77790] https://bugs.debian.org/587650 Memoize::Storable: respect 'nstore' option not respected
    DEBPKG:debian/makemaker-pasthru - https://bugs.debian.org/758471 Pass LD settings through to subdirectories
    DEBPKG:debian/makemaker-manext - https://bugs.debian.org/247370 Make EU::MakeMaker honour MANnEXT settings in generated manpage headers
    DEBPKG:debian/kfreebsd-softupdates - https://bugs.debian.org/796798 Work around Debian Bug#796798
    DEBPKG:fixes/memoize-pod - [rt.cpan.org #89441] Fix POD errors in Memoize
    DEBPKG:debian/hurd-softupdates - https://bugs.debian.org/822735 Fix t/op/stat.t failures on hurd
    DEBPKG:fixes/math_complex_doc_great_circle - https://bugs.debian.org/697567 [rt.cpan.org #114104] Math::Trig: clarify definition of great_circle_midpoint
    DEBPKG:fixes/math_complex_doc_see_also - https://bugs.debian.org/697568 [rt.cpan.org #114105] Math::Trig: add missing SEE ALSO
    DEBPKG:fixes/math_complex_doc_angle_units - https://bugs.debian.org/731505 [rt.cpan.org #114106] Math::Trig: document angle units
    DEBPKG:fixes/cpan_web_link - https://bugs.debian.org/367291 CPAN: Add link to main CPAN web site
    DEBPKG:debian/hppa_op_optimize_workaround - https://bugs.debian.org/838613 Temporarily lower the optimization of op.c on hppa due to gcc-6 problems
    DEBPKG:debian/installman-utf8 - https://bugs.debian.org/840211 Generate man pages with UTF-8 characters
    DEBPKG:debian/hppa_opmini_optimize_workaround - https://bugs.debian.org/869122 Lower the optimization level of opmini.c on hppa
    DEBPKG:debian/sh4_op_optimize_workaround - https://bugs.debian.org/869373 Also lower the optimization level of op.c and opmini.c on sh4
    DEBPKG:debian/perldoc-pager - https://bugs.debian.org/870340 [rt.cpan.org #120229] Fix perldoc terminal escapes when sensible-pager is less
    DEBPKG:debian/prune_libs - https://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
    DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
    DEBPKG:debian/configure-regen - https://bugs.debian.org/762638 Regenerate Configure et al. after probe unit changes
    DEBPKG:debian/deprecate-with-apt - https://bugs.debian.org/747628 Point users to Debian packages of deprecated core modules
    DEBPKG:debian/disable-stack-check - https://bugs.debian.org/902779 [GH #16607] Disable debugperl stack extension checks for binary compatibility with perl
    DEBPKG:debian/perlbug-editor - https://bugs.debian.org/922609 Use "editor" as the default perlbug editor, as per Debian policy
    DEBPKG:debian/eu-mm-perl-base - https://bugs.debian.org/962138 Suppress an ExtUtils::MakeMaker warning about our non-default @INC
    DEBPKG:fixes/hurd-cachepropagate-test-fix - https://bugs.debian.org/963214 GNU/Hurd doesn't support SO_PROTOCOL
    DEBPKG:fixes/io_socket_ip_ipv6 - Disable getaddrinfo(3) AI_ADDRCONFIG for localhost and IPv4 numeric addresses
    DEBPKG:disable-libperl-tests -
    DEBPKG:CVE-2021-36770.patch - [PATCH] mitigate @INC pollution when loading ConfigLocal
  Built under linux
  Compiled at Aug  2 2021 12:24:15
  @INC:
    /etc/perl
    /usr/local/lib/x86_64-linux-gnu/perl/5.32.1
    /usr/local/share/perl/5.32.1
    /usr/lib/x86_64-linux-gnu/perl5/5.32
    /usr/share/perl5
    /usr/lib/x86_64-linux-gnu/perl-base
    /usr/lib/x86_64-linux-gnu/perl/5.32
    /usr/share/perl/5.32
    /usr/local/lib/site_perl

@Grinnz
Copy link
Contributor

Grinnz commented Aug 17, 2021

This could be a bug but is likely a stack issue related to the dynamic scope of the number variables. Generally you should only expect them to be set correctly in the full statement directly after the one containing the m// or s/// (and only if they successfully match).

@rapilodev
Copy link
Author

rapilodev commented Aug 17, 2021

This could be a bug but is likely a stack issue related to the dynamic scope of the number variables. Generally you should only expect them to be set correctly in the full statement directly after the one containing the m// or s/// (and only if they successfully match).

Thanks for your fast answer! The question I have is when and in what order the sub-statements are evaluated and why they are not handled separately. I'm not sure if this is really a bug, but it's unlike anything I thought I knew about perl.

Maybe the issue can be shown more clear by defining an array containing two separate regexps matches.
perl -e 'use strict; use warnings; my @c = ( ("a" =~ m/(a)/ && $1) , ("b" =~ m/(b)/ && $1) ) ; print join("",@c)."\n";'
output: bb
@c defines an array of two separate regexps with no common scope. So I would expect two separate result values as I see no connection between both member statements.

Here again, if I assign the values to a copy, it works:
perl -e 'use strict; use warnings; my @c = ( my $va = ("a" =~ m/(a)/ && $1) , (my $vb = "b" =~ m/(b)/ && $1) ) ; print join("",@c)."\n";'
output: ab

Even if $1 is somehow reused, I would expect the array elements to be matched and evaluated sequentially from left to right.
Here it looks to me as if all array elements are first matched and evaluated, but without replacing the $1 variable with its contents. In a second step all $1 variables in all elements are replaced by the content of the last calculated $1 value.

@Grinnz
Copy link
Contributor

Grinnz commented Aug 17, 2021

Basically within a single statement, direct use of $1 refers to the same global variable and the final value used for it gets updated by the time it's returned for the assignment (while still on the stack). If you quote it, it is instead a copy of the global on the stack. Like I said this may be a bug but it is due to how the stack works within a single expression.

@Grinnz
Copy link
Contributor

Grinnz commented Aug 17, 2021

In other words, @c is not an array of two separate regexes, it is an array to which you assign a list. That list is constructed by evaluating two expressions, and the result of both expressions is $1 which refers to the same global variable.

@rapilodev
Copy link
Author

Thank you very much! This helps me a lot to understand why it works the way it does.

@ilmari
Copy link
Member

ilmari commented Aug 18, 2021

This has nothing to do with $1 being special, it's because the . operator gets passed the variables, and doesn't read their values until after both sides have been evaluated:

$ perl -Mstrict -Mwarnings -E 'my $x; my $c = ($x = "a") . ($x = "b"); say $c'
bb

It "works" with double-quoted values because the string interpolation reads the value immediately and returns a temporary with a copy of it as it was at the time.

@Grinnz
Copy link
Contributor

Grinnz commented Aug 18, 2021

In a way this is similar to how using a variable in the same expression where auto-increment is also used on it leads to undefined behavior. https://perldoc.perl.org/perlop#Auto-increment-and-Auto-decrement

@jkeenan
Copy link
Contributor

jkeenan commented Aug 24, 2021

Thank you very much! This helps me a lot to understand why it works the way it does.

The OP's concerns appear to have been addressed. Is this ticket closable? Is some documentation change warranted?

@jkeenan jkeenan added the Closable? We might be able to close this ticket, but we need to check with the reporter label Aug 24, 2021
@demerphq
Copy link
Collaborator

demerphq commented Aug 24, 2021 via email

@iabyn
Copy link
Contributor

iabyn commented Aug 30, 2021 via email

@jkeenan jkeenan removed the Closable? We might be able to close this ticket, but we need to check with the reporter label Sep 13, 2021
@khwilliamson
Copy link
Contributor

I think we can close this? Any objections

@hvds
Copy link
Contributor

hvds commented Mar 23, 2022

I think we can close this? Any objections

The doc team (are they still about?) might want to verify that the underlying concept here is clearly documented wherever it needs to be. But in respect of perl's behaviour I think this is closable.

@jkeenan
Copy link
Contributor

jkeenan commented Mar 24, 2022

I think we can close this? Any objections

The doc team (are they still about?)

Unfortunately, no.

might want to verify that the underlying concept here is clearly documented wherever it needs to be. But in respect of perl's behaviour I think this is closable.

Agreed; closing.

@jkeenan jkeenan closed this as completed Mar 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants