regexp problem in perl 5.6.1 and 5.8.4

Thomas Stauffer · Jun 4, 2004

I have done some Perl programming in the past but I am by no means and
expert. I am currently working on changing some code written some time
ago by an employee no longer with the company. The code is currently
running under 5.005.02. I am making changes and adding some ucs2 ->
utf8 conversion. I want to run the code under Perl 5.8.4 to take
advantage of Perl's internal Unicode support. At any rate, there is a
regular expression in the code the works fine under 5.005.02 but loops
under 5.6.1 and above. Following code illustrates the problem:

$orig_string = 'JKXXAF';

$regex = qr {\G
# Match as many characters as possible
# that can be passed thru as-is
([^\x00-\xFF]+)

# Then try to match $A1 and next two bytes
| (@..)

# Otherwise just get the next byte
| (.)
}sx;

print "regex = $regex\n";

while ($orig_string =~ /$regex/g) {
print "\$1=$1\n";
print "\$2=$2\n";
print "\$3=$3\n";
}

The problem seems to be with the use of the \G attribute. If I take it
out, the regular expression works the same in all versions of Perl.
However, since I did not write the code and the programmer who did was
considerably more experienced using Perl than I am, I am hesitant just
to remove it. Anyhow, I have been looking at this for several days
without success. My Perl expert suggested I post it to this forum. Any
help would be greatly appreciated.

Following is the details of the version of Perl I'm using:

Summary of my perl5 (revision 5 version 8 subversion 4) configuration:
Platform:
osname=solaris, osvers=2.8, archname=sun4-solaris
uname='sunos cwu21awu 5.8 generic_108528-29 sun4u sparc
sunw,sun-blade-100 '
config_args=''
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='/opt/SUNWspro/bin/cc', ccflags =' -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64',
optimize='-O',
cppflags=''
ccversion='Sun WorkShop 6 update 2 C 5.3 Patch 111679-08
2002/05/09', gccversion='', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='/opt/SUNWspro/bin/cc', ldflags =' -L/usr/lib -L/usr/ccs/lib
-L/opt/SUNWspro/WS6U2/lib -L/usr/local/lib '
libpth=/usr/lib /usr/ccs/lib /opt/SUNWspro/WS6U2/lib /usr/local/lib
libs=-lsocket -lnsl -ldl -lm -lc
perllibs=-lsocket -lnsl -ldl -lm -lc
libc=/lib/libc.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
cccdlflags='-KPIC', lddlflags='-G -L/usr/lib -L/usr/ccs/lib
-L/opt/SUNWspro/WS6U2/lib -L/usr/local/lib'

Characteristics of this binary (from libperl):
Compile-time options: USE_LARGE_FILES
Built under solaris
Compiled at Apr 22 2004 16:07:19
@INC:
/usr/local/perl5/lib/5.8.4/sun4-solaris
/usr/local/perl5/lib/5.8.4
/usr/local/perl5/lib/site_perl/5.8.4/sun4-solaris
/usr/local/perl5/lib/site_perl/5.8.4
/usr/local/perl5/lib/site_perl

Anno Siegel · Jun 5, 2004

Thomas Stauffer said:
I have done some Perl programming in the past but I am by no means and
expert. I am currently working on changing some code written some time
ago by an employee no longer with the company. The code is currently
running under 5.005.02. I am making changes and adding some ucs2 ->
utf8 conversion. I want to run the code under Perl 5.8.4 to take
advantage of Perl's internal Unicode support. At any rate, there is a
regular expression in the code the works fine under 5.005.02 but loops
under 5.6.1 and above. Following code illustrates the problem:

$orig_string = 'JKXXAF';

$regex = qr {\G
# Match as many characters as possible
# that can be passed thru as-is
([^\x00-\xFF]+)

# Then try to match $A1 and next two bytes
| (@..)

# Otherwise just get the next byte
| (.)
}sx;

print "regex = $regex\n";

while ($orig_string =~ /$regex/g) {
print "\$1=$1\n";
print "\$2=$2\n";
print "\$3=$3\n";
}

The problem seems to be with the use of the \G attribute. If I take it
out, the regular expression works the same in all versions of Perl.
However, since I did not write the code and the programmer who did was
considerably more experienced using Perl than I am, I am hesitant just
to remove it. Anyhow, I have been looking at this for several days
without success. My Perl expert suggested I post it to this forum. Any
help would be greatly appreciated.

The \G is really not needed for the function of the loop. //g in scalar
context makes sure \G is implicitly matched before each match is attempted.

Note that adding \G only anchors the first alternative explicitly,
the second and third are free to match anywhere. One could argue
that scalar //g should still anchor the whole match, so the current
would be a bug. In any case, the behavior in presence of both
/G and //g appears to have changed.

Adding non-capturing parentheses around the alternative fixes the
behavior:

my $regex = qr { \G
(?:
# Match as many characters as possible
# that can be passed thru as-is
([^\x00-\xFF]+)

# Then try to match $A1 and next two bytes
| (@..)

# Otherwise just get the next byte
| (.)
)
}sx;

I'd say you can safely leave it \G off. If you want to keep it, add
the grouping, otherwise it doesn't make much sense.

Anno

compiling / threads / perl-5.10.1	1	Sep 10, 2009
perl -d memory fault and core dump.	0	Nov 20, 2006
Perl Module DBD::ORACLE	2	Nov 6, 2003
Installing TK into Oracle's perl installation	1	May 17, 2006
FileHandle messes up Oracle connection	4	Dec 21, 2007
Filename variable going away	5	Nov 7, 2011
Error building Math::Pari on AIX 7.1	1	Mar 7, 2012
Intermittent errors when loading modules in ActivePerl 5.8.0	2	Dec 2, 2003

regexp problem in perl 5.6.1 and 5.8.4

Thomas Stauffer

Anno Siegel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads