Perl Fails To List All The Multiple Matches In The Same Line?


C

Cibalo

Hello,

I would like to list all the 5-digit zip codes in my database, of
which a line may contain more than one zip codes. Then, I create a
test database, testdb, for testing as follows.

# echo -e 'zip1 10036; zip2 48226; zip3 94128\nzip4 V8Y 1L1; zip5
400069\nzip6 \nzip7 12345' > testdb
# cat testdb
zip1 10036; zip2 48226; zip3 94128
zip4 V8Y 1L1; zip5 400069
zip6
zip7 12345
# perl -wnl -e '/\b[0-9]{5}\b/g and print "$.: $&";' testdb
1: 10036
4: 12345
# grep -now -e '[0-9]\{5\}' testdb
1:10036
48226
94128
4:12345
#

Even with the global modifier, the above perl script lists only the
first pattern match with multiple matches in the same line. But I can
make it worked with grep as listed above.

What's wrong with my perl script? What am I missing?

# perl --version; grep --version
This is perl, v5.10.0 built for i386-linux-thread-multi
Copyright 1987-2007, Larry Wall
Perl may be copied only under the terms of either the Artistic License
or the
GNU General Public License, which may be found in the Perl 5 source
kit.
Complete documentation for Perl, including FAQ lists, should be found
on
this system using "man perl" or "perldoc perl". If you have access to
the
Internet, point your browser at http://www.perl.org/, the Perl Home
Page.

grep (GNU grep) 2.5.1
Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is
NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
#

Thank you very much for your assistance.

Best Regards,

cibalo
 
Ad

Advertisements

J

John W. Krahn

Cibalo said:
I would like to list all the 5-digit zip codes in my database, of
which a line may contain more than one zip codes. Then, I create a
test database, testdb, for testing as follows.

# echo -e 'zip1 10036; zip2 48226; zip3 94128\nzip4 V8Y 1L1; zip5
400069\nzip6 \nzip7 12345' > testdb
# cat testdb
zip1 10036; zip2 48226; zip3 94128
zip4 V8Y 1L1; zip5 400069
zip6
zip7 12345
# perl -wnl -e '/\b[0-9]{5}\b/g and print "$.: $&";' testdb
1: 10036
4: 12345
# grep -now -e '[0-9]\{5\}' testdb
1:10036
48226
94128
4:12345
#

Even with the global modifier, the above perl script lists only the
first pattern match with multiple matches in the same line. But I can
make it worked with grep as listed above.

The problem is that even with the global option the pattern is evaluated
in scalar context and so will only match once. You need to either match
in list context:

$ echo "zip1 10036; zip2 48226; zip3 94128
zip4 V8Y 1L1; zip5 400069
zip6
zip7 12345
" | perl -lne'print "$.: $_" for /\b[0-9]{5}\b/g'
1: 10036
1: 48226
1: 94128
4: 12345


Or match all patterns in scalar context:

$ echo "zip1 10036; zip2 48226; zip3 94128
zip4 V8Y 1L1; zip5 400069
zip6
zip7 12345
" | perl -lne'print "$.: $1" while /\b([0-9]{5})\b/g'
1: 10036
1: 48226
1: 94128
4: 12345



John
 
S

sln

Even with the global modifier, the above perl script lists only the
first pattern match with multiple matches in the same line. But I can
make it worked with grep as listed above.

The problem is that even with the global option the pattern is evaluated
in scalar context and so will only match once. You need to either match
in list context:

$ echo "zip1 10036; zip2 48226; zip3 94128
zip4 V8Y 1L1; zip5 400069
zip6
zip7 12345
" | perl -lne'print "$.: $_" for /\b[0-9]{5}\b/g'
^^^^^^^^^^^^^^^^^^^
Carefull, someone might accuse you of obfuscation.

while (<DATA>)
{
print;
@_ = $_ =~ /\b[0-9]{5}\b/g;
for (@_)
{
print "$.: $_\n";
}
}

-sln
 
S

sln

Cibalo said:
I would like to list all the 5-digit zip codes in my database, of
which a line may contain more than one zip codes. Then, I create a
test database, testdb, for testing as follows.

# echo -e 'zip1 10036; zip2 48226; zip3 94128\nzip4 V8Y 1L1; zip5
400069\nzip6 \nzip7 12345' > testdb
# cat testdb
zip1 10036; zip2 48226; zip3 94128
zip4 V8Y 1L1; zip5 400069
zip6
zip7 12345
# perl -wnl -e '/\b[0-9]{5}\b/g and print "$.: $&";' testdb
1: 10036
4: 12345
# grep -now -e '[0-9]\{5\}' testdb
1:10036
48226
94128
4:12345
#

Even with the global modifier, the above perl script lists only the
first pattern match with multiple matches in the same line. But I can
make it worked with grep as listed above.

The problem is that even with the global option the pattern is evaluated
in scalar context and so will only match once. You need to either match
in list context:

$ echo "zip1 10036; zip2 48226; zip3 94128
zip4 V8Y 1L1; zip5 400069
zip6
zip7 12345
" | perl -lne'print "$.: $_" for /\b[0-9]{5}\b/g'
1: 10036
1: 48226
1: 94128
4: 12345


Or match all patterns in scalar context:

$ echo "zip1 10036; zip2 48226; zip3 94128
zip4 V8Y 1L1; zip5 400069
zip6
zip7 12345
" | perl -lne'print "$.: $1" while /\b([0-9]{5})\b/g' ^^^^^^^^^^^^^^^^^^^^^^^
1: 10036
1: 48226
1: 94128
4: 12345



John

I always enjoy (and marvel at) seeing Unix 1 liner shell
compositions here. Seems so at ease and natural. I just got Windyo'z.
When I cut and paste these 1 liners (even though my shell does 'echo')
each line is treated as a new command, even when I batch it.
Unfortunately, the {'"} syntax is also different under Windows (and I
have XP, the great).

Why can't windows do unix?

Oh well, I have to settle for the 'jist' and test using a pl file.
This last works as expected, the first (list context) is slightly obfuscated,
or would be to the OP, who never got past the /g switch meaning.

Btw, nice explanation John.

while (<DATA>)
{
while (/\b([0-9]{5})\b/g)
{
print "$.: $1\n";
}
}

-sln
 
S

sln

Even with the global modifier, the above perl script lists only the
first pattern match with multiple matches in the same line. But I can
make it worked with grep as listed above.

The problem is that even with the global option the pattern is evaluated
in scalar context and so will only match once. You need to either match
in list context:

$ echo "zip1 10036; zip2 48226; zip3 94128
zip4 V8Y 1L1; zip5 400069
zip6
zip7 12345
" | perl -lne'print "$.: $_" for /\b[0-9]{5}\b/g'
^^^^^^^^^^^^^^^^^^^
Carefull, someone might accuse you of obfuscation.

while (<DATA>)
{
print;
@_ = $_ =~ /\b[0-9]{5}\b/g;
for (@_)
{
print "$.: $_\n";
}
}

-sln

Funny how
for /\b[0-9]{5}\b/g
works, but this
@_ = $_ =~ /\b[0-9]{5}\b/g;
for ()
doesen't.

As though the shortcut's got shorted.

-sln
 
Ad

Advertisements

J

Jürgen Exner

"Why can't a Ford do a Chevy?"
In this case, it can. Install cygwin.

That doesn't "do Unix" (whatever that is supposed to mean).
It merely provides the typical Unix utilities in the Windows
environment.

jue
 
S

sln

"Why can't a Ford do a Chevy?"

In this case, the only thing a 'Chevy' can do is Camaro.
Ford can do anything. Buy a horse (Mustang).
That doesn't "do Unix" (whatever that is supposed to mean).

Why does Unix do /dir/dir/dir/not_dir (whatever that means), and why forward slashes?
Is /dir/dir/dir/not_\dir available?
It merely provides the typical Unix utilities in the Windows
environment.

This means a compiler right?

-sln
 
S

sln

No it means binaries (and in typical unix tradition, also a compiler,
it's one of the binaries).

M4

Since I have to learn everything on my own (because class is too slow),
they (an employer) would have to pay me (unix deliverables) while I am
forced to learn. To shift to different OSs' all the time takes a lot
out of me. I can deliver unix code with a compiler that keeps me in line.
I'm so lazy I make the compiler do my work. Make it tell me my errors,
take me to my errors, take me to the docs, make it fix it for me.
IDE's are my slave's, they get out of line ... I pop em in the mout'

-sln
 
S

sln

Windows uses back slashes while unix uses forward slashes. A mainframe
uses periods (.).

Therefore unix's /dir/dir/dir/not_dir
is windows c:\dir\dir\dir\file

In your Perl code you should use forward slashed even when on windows.
For example:

open FH, '<', 'c:/dir/dir/dir/file') or die ........


Although not_dir does not mean a file for unix. For example, not_dir can
be a link to another file or directory. But I won't go into that here.

Hey thanks! I already had an idea 'not_dir can be a link to another file or directory',
but I didn't go down that path when I scanned that line somewhere.

The ///// slashes are a Perl comfort thing, unfortunately, intrinsic separators asigned
to my $sep are platform useless given the former thanks. But, OS normalization is, like
you said, maybe not guaranteed. I just hate OS'.

-sln
 
Ad

Advertisements

S

sln

Cibalo wrote:
I would like to list all the 5-digit zip codes in my database, of
which a line may contain more than one zip codes. Then, I create a
test database, testdb, for testing as follows.

# echo -e 'zip1 10036; zip2 48226; zip3 94128\nzip4 V8Y 1L1; zip5
400069\nzip6 \nzip7 12345' > testdb
# cat testdb
zip1 10036; zip2 48226; zip3 94128
zip4 V8Y 1L1; zip5 400069
zip6
zip7 12345
# perl -wnl -e '/\b[0-9]{5}\b/g and print "$.: $&";' testdb
1: 10036
4: 12345
# grep -now -e '[0-9]\{5\}' testdb
1:10036
48226
94128
4:12345
#

Even with the global modifier, the above perl script lists only the
first pattern match with multiple matches in the same line. But I can
make it worked with grep as listed above.
The problem is that even with the global option the pattern is evaluated
in scalar context and so will only match once. You need to either match
in list context:

$ echo "zip1 10036; zip2 48226; zip3 94128
zip4 V8Y 1L1; zip5 400069
zip6
zip7 12345
" | perl -lne'print "$.: $_" for /\b[0-9]{5}\b/g'
1: 10036
1: 48226
1: 94128
4: 12345


Or match all patterns in scalar context:

$ echo "zip1 10036; zip2 48226; zip3 94128
zip4 V8Y 1L1; zip5 400069
zip6
zip7 12345
" | perl -lne'print "$.: $1" while /\b([0-9]{5})\b/g' ^^^^^^^^^^^^^^^^^^^^^^^
1: 10036
1: 48226
1: 94128
4: 12345



John

I always enjoy (and marvel at) seeing Unix 1 liner shell
compositions here. Seems so at ease and natural. I just got Windyo'z.
When I cut and paste these 1 liners (even though my shell does 'echo')
each line is treated as a new command, even when I batch it.
Unfortunately, the {'"} syntax is also different under Windows (and I
have XP, the great).

You just need to adjust the one-liner a bit to make the unix one-liners
work on windows.

unix:
$ echo "zip1 10036; zip2 48226; zip3 94128" | perl -lne'print "$.: $1"
while /\b([0-9]{5})\b/g'
1: 10036
1: 48226
1: 94128


windows:
d:\>echo "zip1 10036; zip2 48226; zip3 94128" | perl -nle "print qq($.:
$1) while /\b([0-9]{5})\b/g"
1: 10036
1: 48226
1: 94128

I tend to use the qq() syntax when I need a double-quote in the
one-liner for windows.

Thanks Len, I really appretiate that!

-sln
 
J

Jürgen Exner

Wrong question. Question should have been "Why does Windows not use
forward slashes?" After all Unix predates Windows by a decade and a
half.

Sure, why not?
I'm not absoluely certain but I thing this should be the same name as
just not_dir. The \d is not a special character (unlike \t or \r),
therefore the escape should be ignored.

jue
 
S

sln

Wrong question. Question should have been "Why does Windows not use
forward slashes?" After all Unix predates Windows by a decade and a
half.

I still have Unix programming Manual's 1 & 2 by Bell Labratories sitting
in my book case (dark blue-green). I can probably still use them, huh?
What year did you say that was?

Seems since (mostly) the begining, unix had to be compiled with the features
you wanted. Was it all source available or dlls as well?

Its a good thing you don't have to compile Windowz, anything goes wrong, all
you have to do is blame Microsoft, the winner (or weenier)!

Slash unix AND windoz.

-sln
 
P

Peter J. Holzer

Wrong question. Question should have been "Why does Windows not use
forward slashes?" After all Unix predates Windows by a decade and a
half.

Easy to answer: MS-DOS 1.0 had inherited the forward slash as an option
marker from CP/M. MS-DOS 2.0 added a lot of Unix features (like a
filedescriptor-based I/O API and a hierarchical file system) but they
didn't want an incompatible change to the CLI. So the slash remained the
option marker and the (previously unused) backslash became the directory
separator. But there was a "switchar" (sic!) system call, which could be
used to set and query the current switch (=option) character. All
Microsoft and many third party utilities used this, so you could set the
option character to '-' and then use commands like:

dir -w c:/foo

Regardless of this setting, the system calls always accepted both the
slash and the backslash as a directory separator (and that's still the
case in Windows).

hp
 
U

Uri Guttman

PJH> Easy to answer: MS-DOS 1.0 had inherited the forward slash as an
PJH> option marker from CP/M. MS-DOS 2.0 added a lot of Unix features
PJH> (like a filedescriptor-based I/O API and a hierarchical file

you have to go back even farther than that. cp/m was derived from dec's
RT-11 which has / for option markers. and most dec OS's did that too.

uri
 
Ad

Advertisements

J

Jürgen Exner

Ben Morrow said:
Quoth Jürgen Exner said:
forward slashes?

Wrong question. Question should have been "Why does Windows not use
forward slashes?" After all Unix predates Windows by a decade and a
half.
[...]
Not every Windows incompatibility with Unix is stupid: they are simply
different OSen with rather different histories and influences.

Fair enough. And certainly true.

But how dare you adding reason to an argument about the best editor,
errrm, best OS, errrrm , longest ..... :)

jue
 
S

sln

PJH> Easy to answer: MS-DOS 1.0 had inherited the forward slash as an
PJH> option marker from CP/M. MS-DOS 2.0 added a lot of Unix features
PJH> (like a filedescriptor-based I/O API and a hierarchical file

you have to go back even farther than that. cp/m was derived from dec's
RT-11 which has / for option markers. and most dec OS's did that too.

uri

Then, the guy who did Dec, did Windowz.

-sln
 
Ad

Advertisements

K

Keith Thompson

l v said:
Windows uses back slashes while unix uses forward slashes. A
mainframe uses periods (.).

That depends on the mainframe, and which OS it's running.
Therefore unix's /dir/dir/dir/not_dir
is windows c:\dir\dir\dir\file

In your Perl code you should use forward slashed even when on
windows. For example:

open FH, '<', 'c:/dir/dir/dir/file') or die ........

Why? I mean, I'm aware that it will work, but what's the real
advantage of using '/' rather than '\' on Windows?

One *small* advantage is that you don't have worry about escaping
backslashes in double-quoted strings. (The solution: Remember to
escape the backslashes.)

But I can think of two disadvantages. One is that the string might be
passed to the command processor at some point. Another is that it
might be displayed to the user, and most Windows users probably don't
know that '/' is a valid directory delimiter.

If you're hardwiring file paths like C:\dir\file.txt, you're writing
Windows-specific code anyway. Why not use the form that's most
natural for Windows? (Or, better yet, don't hardwire paths in your
script.)

[...]
 
Ad

Advertisements


Top