Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many warnings in regcomp.c can occur twice #14061

Closed
p5pRT opened this issue Sep 1, 2014 · 13 comments
Closed

Many warnings in regcomp.c can occur twice #14061

p5pRT opened this issue Sep 1, 2014 · 13 comments

Comments

@p5pRT
Copy link

p5pRT commented Sep 1, 2014

Migrated from rt.perl.org#122671 (status was 'resolved')

Searchable as RT122671$

@p5pRT
Copy link
Author

p5pRT commented Sep 1, 2014

From @khwilliamson

This is a bug report for perl from khw@​khw.(none),
generated with the help of perlbug 1.40 running under perl 5.21.4.


The regex compiler has many of its possible warnings emitted in the
first of the two main passes. The problem is that this pass can be
run twice causing the same warning to be emitted twice. This repeated
pass occurs when something the parser encounters makes it decide it
needs to re-encode the pattern as UTF-8. At that point it cleans up
what it's doing at the moment, upgrades the pattern, and restarts. Any
warning already raised by the time it decides to upgrade will be raised
again. (If the pattern is already in UTF-8 at the start of parsing,
there is no a problem.)



Flags​:
  category=core
  severity=low


Site configuration information for perl 5.21.4​:

Configured by khw at Sun Aug 31 08​:55​:05 MDT 2014.

Summary of my perl5 (revision 5 version 21 subversion 4) configuration​:
  Commit id​: 9a7aaf5
  Platform​:
  osname=linux, osvers=3.13.0-35-generic,
archname=x86_64-linux-thread-multi-ld
  uname='linux khw 3.13.0-35-generic #62-ubuntu smp fri aug 15
01​:58​:42 utc 2014 x86_64 x86_64 x86_64 gnulinux '
  config_args='-des -Uversiononly -Dprefix=/home/khw/blead -Dusedevel
-D'optimize=-g' -A'optimize=-g' -A'optimize=-O0'
-Accflags='-DPERL_BOOL_AS_CHAR' -Dman1dir=none -Dman3dir=none
-DDEBUGGING -DDEBUGGING -Dcc=g++ -Dusemorebits -Dusethreads'
  hint=recommended, useposix=true, d_sigaction=define
  useithreads=define, usemultiplicity=define
  use64bitint=define, use64bitall=define, uselongdouble=define
  usemymalloc=n, bincompat5005=undef
  Compiler​:
  cc='g++', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DPERL_BOOL_AS_CHAR
-fwrapv -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
-D_FORTIFY_SOURCE=2',
  optimize='-g -g -O0',
  cppflags='-D_REENTRANT -D_GNU_SOURCE -DPERL_BOOL_AS_CHAR -fwrapv
-DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include'
  ccversion='', gccversion='4.8.2', gccosandvers=''
  intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
  d_longlong=define, longlongsize=8, d_longdbl=define,
longdblsize=16, longdblkind=3
  ivtype='long', ivsize=8, nvtype='long double', nvsize=16,
Off_t='off_t', lseeksize=8
  alignbytes=16, prototype=define
  Linker and Libraries​:
  ld='g++', ldflags =' -fstack-protector -L/usr/local/lib'
  libpth=/usr/include/c++/4.8 /usr/include/x86_64-linux-gnu/c++/4.8
/usr/include/c++/4.8/backward /usr/local/lib
/usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed
/usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib
/usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
  libs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
  perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
  libc=libc-2.19.so, so=so, useshrplib=false, libperl=libperl.a
  gnulibc_version='2.19'
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
  cccdlflags='-fPIC', lddlflags='-shared -g -g -O0 -L/usr/local/lib
-fstack-protector'


@​INC for perl 5.21.4​:
  /home/khw/perl/blead/lib
  /home/khw/blead/lib/perl5/site_perl/5.21.4/x86_64-linux-thread-multi-ld
  /home/khw/blead/lib/perl5/site_perl/5.21.4
  /home/khw/blead/lib/perl5/5.21.4/x86_64-linux-thread-multi-ld
  /home/khw/blead/lib/perl5/5.21.4
  /home/khw/blead/lib/perl5/site_perl/5.21.3
  /home/khw/blead/lib/perl5/site_perl/5.21.2
  /home/khw/blead/lib/perl5/site_perl/5.21.1
  /home/khw/blead/lib/perl5/site_perl/5.20.0
  /home/khw/blead/lib/perl5/site_perl/5.19.12
  /home/khw/blead/lib/perl5/site_perl/5.19.11
  /home/khw/blead/lib/perl5/site_perl/5.19.10
  /home/khw/blead/lib/perl5/site_perl
  .


Environment for perl 5.21.4​:
  HOME=/home/khw
  LANG=en_US.UTF-8
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)

PATH=/home/khw/bin​:/home/khw/perl5/perlbrew/bin​:/home/khw/print/bin​:/bin​:/usr/local/sbin​:/usr/local/bin​:/usr/sbin​:/usr/bin​:/sbin​:/usr/games​:/usr/local/games​:/home/khw/iands/www​:/home/khw/cxoffice/bin
  PERL5OPT=-w
  PERL_BADLANG (unset)
  PERL_POD_PEDANTIC=1
  SHELL=/bin/ksh

@p5pRT
Copy link
Author

p5pRT commented Sep 1, 2014

From @cpansprout

On Sun Aug 31 20​:43​:12 2014, public@​khwilliamson.com wrote​:

This is a bug report for perl from khw@​khw.(none),
generated with the help of perlbug 1.40 running under perl 5.21.4.

-----------------------------------------------------------------
The regex compiler has many of its possible warnings emitted in the
first of the two main passes. The problem is that this pass can be
run twice causing the same warning to be emitted twice. This repeated
pass occurs when something the parser encounters makes it decide it
needs to re-encode the pattern as UTF-8. At that point it cleans up
what it's doing at the moment, upgrades the pattern, and restarts.
Any
warning already raised by the time it decides to upgrade will be
raised
again. (If the pattern is already in UTF-8 at the start of parsing,
there is no a problem.)

C.f. commit 688e039. Maybe a similar technique could be used to fix the others.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 1, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Sep 1, 2014

From @jkeenan

On Sun Aug 31 20​:43​:12 2014, public@​khwilliamson.com wrote​:

This is a bug report for perl from khw@​khw.(none),
generated with the help of perlbug 1.40 running under perl 5.21.4.

-----------------------------------------------------------------
The regex compiler has many of its possible warnings emitted in the
first of the two main passes. The problem is that this pass can be
run twice causing the same warning to be emitted twice.

Can you provide some examples?

@p5pRT
Copy link
Author

p5pRT commented Sep 1, 2014

From @khwilliamson

On 08/31/2014 11​:03 PM, Father Chrysostomos via RT wrote​:

On Sun Aug 31 20​:43​:12 2014, public@​khwilliamson.com wrote​:

This is a bug report for perl from khw@​khw.(none),
generated with the help of perlbug 1.40 running under perl 5.21.4.

-----------------------------------------------------------------
The regex compiler has many of its possible warnings emitted in the
first of the two main passes. The problem is that this pass can be
run twice causing the same warning to be emitted twice. This repeated
pass occurs when something the parser encounters makes it decide it
needs to re-encode the pattern as UTF-8. At that point it cleans up
what it's doing at the moment, upgrades the pattern, and restarts.
Any
warning already raised by the time it decides to upgrade will be
raised
again. (If the pattern is already in UTF-8 at the start of parsing,
there is no a problem.)

C.f. commit 688e039. Maybe a similar technique could be used to fix the others.

The problem is that the first pass can be started anywhere during it.
If the restart occurs before it got to the place where the message would
have been output, and we rely on a global flag, the flag will be set at
the time the message would be output in the re-parse, so it will never
get output. (I haven't investigated, but I hope that 688e039
doesn't suffer from the same problem, that is that the study doesn't
abort during the middle of itself, and then the restudy happens at
varying possible points of completion of the original study.)

A generic example of this happening is​:
$ perl -Ilib -le 'qr/(?-o​:foo\x{100})/'
Useless (?-o) - don't use /o modifier in regex; marked by <-- HERE in
m/(?-o <-- HERE :foo\x{100})/ at -e line 1.
Useless (?-o) - don't use /o modifier in regex; marked by <-- HERE in
m/(?-o <-- HERE :foo\x{100})/ at -e line 1.

The solution that seems the simplest is to just defer warnings until
pass2. That works for all but the parts of pass2 that can get repeated.
  I think the only such part is study_chunk.

On closer examination, I found that a number of warnings currently are
deferred to pass 2, and hence are not subject to this bug.

However, actual errors should be tested for and compilation aborted in
the first pass. That means that if a pattern has both errors and
warnings, the warnings will not be displayed until the errors get fixed.
  That really is no different than if a pattern has two errors. The 2nd
isn't shown until the first one gets fixed, as compilation is
(necessarily) stopped after that first one. But it would be a change in
behavior from current output, albeit only for fatally flawed patterns

@p5pRT
Copy link
Author

p5pRT commented Sep 1, 2014

From @cpansprout

On Mon Sep 01 08​:41​:11 2014, public@​khwilliamson.com wrote​:

However, actual errors should be tested for and compilation aborted in
the first pass. That means that if a pattern has both errors and
warnings, the warnings will not be displayed until the errors get
fixed.
That really is no different than if a pattern has two errors. The
2nd
isn't shown until the first one gets fixed, as compilation is
(necessarily) stopped after that first one. But it would be a change
in
behavior from current output, albeit only for fatally flawed patterns

I think that would be an acceptable change.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 4, 2014

From @cpansprout

On Mon Sep 01 08​:41​:11 2014, public@​khwilliamson.com wrote​:

The solution that seems the simplest is to just defer warnings until
pass2.

I have a question​: Does that mean non-utf8 patterns might miss the warnings because they don’t need a second pass?

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Sep 4, 2014

From @khwilliamson

On 09/03/2014 06​:54 PM, Father Chrysostomos via RT wrote​:

On Mon Sep 01 08​:41​:11 2014, public@​khwilliamson.com wrote​:

The solution that seems the simplest is to just defer warnings until
pass2.

I have a question​: Does that mean non-utf8 patterns might miss the warnings because they don’t need a second pass?

No. Everything has what is called a "second" pass. But that pass could
actually be a 3rd pass. It's because what's called the first pass may
have to be done twice. Ugh!. The first pass is to calculate how much
space is needed for the pattern. If the pattern isn't already in UTF-8,
and somewhere in the middle of calculating the space, it discovers it
needs to be encoded in UTF-8 (typically by finding something like a
\x{foo} that means a literal character above ordinal 255), it upgrades
the pattern to UTF-8, which may expand the upper-latin1 characters that
have already been processed into two bytes each, making the current
length calculation wrong. So it starts over, redoing the whole "first"
pass.

Only the portion of the parsing up to the point where the too-big
character is found is wasted. The worst case is if it is the final
character in the pattern.

@p5pRT
Copy link
Author

p5pRT commented Sep 5, 2014

From @demerphq

On 4 September 2014 03​:37, Karl Williamson <public@​khwilliamson.com> wrote​:

On 09/03/2014 06​:54 PM, Father Chrysostomos via RT wrote​:

On Mon Sep 01 08​:41​:11 2014, public@​khwilliamson.com wrote​:

The solution that seems the simplest is to just defer warnings until
pass2.

I have a question​: Does that mean non-utf8 patterns might miss the
warnings because they don’t need a second pass?

No. Everything has what is called a "second" pass. But that pass could
actually be a 3rd pass. It's because what's called the first pass may have
to be done twice. Ugh!. The first pass is to calculate how much space is
needed for the pattern. If the pattern isn't already in UTF-8, and
somewhere in the middle of calculating the space, it discovers it needs to
be encoded in UTF-8 (typically by finding something like a \x{foo} that
means a literal character above ordinal 255), it upgrades the pattern to
UTF-8, which may expand the upper-latin1 characters that have already been
processed into two bytes each, making the current length calculation
wrong. So it starts over, redoing the whole "first" pass.

Only the portion of the parsing up to the point where the too-big
character is found is wasted. The worst case is if it is the final
character in the pattern.

One day I or someone will get around to making the compiler build an optree
in the first pass, and then do an optimising second pass, and then a opcode
generating third pass.

Instead of what we do now which is a memory sizing first pass, possibly
executed twice, followed by an opcode generating second pass, followed by
an optimizing third pass.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Sep 7, 2014

From @khwilliamson

Fixed in blead by 499333d
--
Karl Williamson

@p5pRT
Copy link
Author

p5pRT commented Sep 7, 2014

@khwilliamson - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Author

p5pRT commented Jun 2, 2015

From @khwilliamson

Thanks for submitting this ticket

The issue should be resolved with the release today of Perl v5.22. If you find that the problem persists, feel free to reopen this ticket

--
Karl Williamson for the Perl 5 porters team

@p5pRT p5pRT closed this as completed Jun 2, 2015
@p5pRT
Copy link
Author

p5pRT commented Jun 2, 2015

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant