Perl Fails To List All The Multiple Matches In The Same Line?

Discussion in 'Perl Misc' started by Cibalo, Jul 8, 2009.

  1. Cibalo

    Cibalo Guest

    Hello,

    I would like to list all the 5-digit zip codes in my database, of
    which a line may contain more than one zip codes. Then, I create a
    test database, testdb, for testing as follows.

    # echo -e 'zip1 10036; zip2 48226; zip3 94128\nzip4 V8Y 1L1; zip5
    400069\nzip6 \nzip7 12345' > testdb
    # cat testdb
    zip1 10036; zip2 48226; zip3 94128
    zip4 V8Y 1L1; zip5 400069
    zip6
    zip7 12345
    # perl -wnl -e '/\b[0-9]{5}\b/g and print "$.: $&";' testdb
    1: 10036
    4: 12345
    # grep -now -e '[0-9]\{5\}' testdb
    1:10036
    48226
    94128
    4:12345
    #

    Even with the global modifier, the above perl script lists only the
    first pattern match with multiple matches in the same line. But I can
    make it worked with grep as listed above.

    What's wrong with my perl script? What am I missing?

    # perl --version; grep --version
    This is perl, v5.10.0 built for i386-linux-thread-multi
    Copyright 1987-2007, Larry Wall
    Perl may be copied only under the terms of either the Artistic License
    or the
    GNU General Public License, which may be found in the Perl 5 source
    kit.
    Complete documentation for Perl, including FAQ lists, should be found
    on
    this system using "man perl" or "perldoc perl". If you have access to
    the
    Internet, point your browser at http://www.perl.org/, the Perl Home
    Page.

    grep (GNU grep) 2.5.1
    Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions. There is
    NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
    PURPOSE.
    #

    Thank you very much for your assistance.

    Best Regards,

    cibalo
     
    Cibalo, Jul 8, 2009
    #1
    1. Advertising

  2. Cibalo wrote:
    >
    > I would like to list all the 5-digit zip codes in my database, of
    > which a line may contain more than one zip codes. Then, I create a
    > test database, testdb, for testing as follows.
    >
    > # echo -e 'zip1 10036; zip2 48226; zip3 94128\nzip4 V8Y 1L1; zip5
    > 400069\nzip6 \nzip7 12345' > testdb
    > # cat testdb
    > zip1 10036; zip2 48226; zip3 94128
    > zip4 V8Y 1L1; zip5 400069
    > zip6
    > zip7 12345
    > # perl -wnl -e '/\b[0-9]{5}\b/g and print "$.: $&";' testdb
    > 1: 10036
    > 4: 12345
    > # grep -now -e '[0-9]\{5\}' testdb
    > 1:10036
    > 48226
    > 94128
    > 4:12345
    > #
    >
    > Even with the global modifier, the above perl script lists only the
    > first pattern match with multiple matches in the same line. But I can
    > make it worked with grep as listed above.


    The problem is that even with the global option the pattern is evaluated
    in scalar context and so will only match once. You need to either match
    in list context:

    $ echo "zip1 10036; zip2 48226; zip3 94128
    zip4 V8Y 1L1; zip5 400069
    zip6
    zip7 12345
    " | perl -lne'print "$.: $_" for /\b[0-9]{5}\b/g'
    1: 10036
    1: 48226
    1: 94128
    4: 12345


    Or match all patterns in scalar context:

    $ echo "zip1 10036; zip2 48226; zip3 94128
    zip4 V8Y 1L1; zip5 400069
    zip6
    zip7 12345
    " | perl -lne'print "$.: $1" while /\b([0-9]{5})\b/g'
    1: 10036
    1: 48226
    1: 94128
    4: 12345



    John
    --
    Those people who think they know everything are a great
    annoyance to those of us who do. -- Isaac Asimov
     
    John W. Krahn, Jul 8, 2009
    #2
    1. Advertising

  3. Cibalo

    Guest

    On Wed, 08 Jul 2009 01:27:10 -0700, "John W. Krahn" <> wrote:

    >Cibalo wrote:
    >>

    <snip>
    >> Even with the global modifier, the above perl script lists only the
    >> first pattern match with multiple matches in the same line. But I can
    >> make it worked with grep as listed above.

    >
    >The problem is that even with the global option the pattern is evaluated
    >in scalar context and so will only match once. You need to either match
    >in list context:
    >
    >$ echo "zip1 10036; zip2 48226; zip3 94128
    >zip4 V8Y 1L1; zip5 400069
    >zip6
    >zip7 12345
    >" | perl -lne'print "$.: $_" for /\b[0-9]{5}\b/g'

    ^^^^^^^^^^^^^^^^^^^
    Carefull, someone might accuse you of obfuscation.

    while (<DATA>)
    {
    print;
    @_ = $_ =~ /\b[0-9]{5}\b/g;
    for (@_)
    {
    print "$.: $_\n";
    }
    }

    -sln
     
    , Jul 9, 2009
    #3
  4. Cibalo

    Guest

    On Wed, 08 Jul 2009 01:27:10 -0700, "John W. Krahn" <> wrote:

    >Cibalo wrote:
    >>
    >> I would like to list all the 5-digit zip codes in my database, of
    >> which a line may contain more than one zip codes. Then, I create a
    >> test database, testdb, for testing as follows.
    >>
    >> # echo -e 'zip1 10036; zip2 48226; zip3 94128\nzip4 V8Y 1L1; zip5
    >> 400069\nzip6 \nzip7 12345' > testdb
    >> # cat testdb
    >> zip1 10036; zip2 48226; zip3 94128
    >> zip4 V8Y 1L1; zip5 400069
    >> zip6
    >> zip7 12345
    >> # perl -wnl -e '/\b[0-9]{5}\b/g and print "$.: $&";' testdb
    >> 1: 10036
    >> 4: 12345
    >> # grep -now -e '[0-9]\{5\}' testdb
    >> 1:10036
    >> 48226
    >> 94128
    >> 4:12345
    >> #
    >>
    >> Even with the global modifier, the above perl script lists only the
    >> first pattern match with multiple matches in the same line. But I can
    >> make it worked with grep as listed above.

    >
    >The problem is that even with the global option the pattern is evaluated
    >in scalar context and so will only match once. You need to either match
    >in list context:
    >
    >$ echo "zip1 10036; zip2 48226; zip3 94128
    >zip4 V8Y 1L1; zip5 400069
    >zip6
    >zip7 12345
    >" | perl -lne'print "$.: $_" for /\b[0-9]{5}\b/g'
    >1: 10036
    >1: 48226
    >1: 94128
    >4: 12345
    >
    >
    >Or match all patterns in scalar context:
    >
    >$ echo "zip1 10036; zip2 48226; zip3 94128
    >zip4 V8Y 1L1; zip5 400069
    >zip6
    >zip7 12345
    >" | perl -lne'print "$.: $1" while /\b([0-9]{5})\b/g'

    ^^^^^^^^^^^^^^^^^^^^^^^
    >1: 10036
    >1: 48226
    >1: 94128
    >4: 12345
    >
    >
    >
    >John


    I always enjoy (and marvel at) seeing Unix 1 liner shell
    compositions here. Seems so at ease and natural. I just got Windyo'z.
    When I cut and paste these 1 liners (even though my shell does 'echo')
    each line is treated as a new command, even when I batch it.
    Unfortunately, the {'"} syntax is also different under Windows (and I
    have XP, the great).

    Why can't windows do unix?

    Oh well, I have to settle for the 'jist' and test using a pl file.
    This last works as expected, the first (list context) is slightly obfuscated,
    or would be to the OP, who never got past the /g switch meaning.

    Btw, nice explanation John.

    while (<DATA>)
    {
    while (/\b([0-9]{5})\b/g)
    {
    print "$.: $1\n";
    }
    }

    -sln
     
    , Jul 9, 2009
    #4
  5. Cibalo

    Guest

    On Thu, 09 Jul 2009 13:51:48 -0700, wrote:

    >On Wed, 08 Jul 2009 01:27:10 -0700, "John W. Krahn" <> wrote:
    >
    >>Cibalo wrote:
    >>>

    ><snip>
    >>> Even with the global modifier, the above perl script lists only the
    >>> first pattern match with multiple matches in the same line. But I can
    >>> make it worked with grep as listed above.

    >>
    >>The problem is that even with the global option the pattern is evaluated
    >>in scalar context and so will only match once. You need to either match
    >>in list context:
    >>
    >>$ echo "zip1 10036; zip2 48226; zip3 94128
    >>zip4 V8Y 1L1; zip5 400069
    >>zip6
    >>zip7 12345
    >>" | perl -lne'print "$.: $_" for /\b[0-9]{5}\b/g'

    > ^^^^^^^^^^^^^^^^^^^
    >Carefull, someone might accuse you of obfuscation.
    >
    >while (<DATA>)
    >{
    > print;
    > @_ = $_ =~ /\b[0-9]{5}\b/g;
    > for (@_)
    > {
    > print "$.: $_\n";
    > }
    >}
    >
    >-sln


    Funny how
    for /\b[0-9]{5}\b/g
    works, but this
    @_ = $_ =~ /\b[0-9]{5}\b/g;
    for ()
    doesen't.

    As though the shortcut's got shorted.

    -sln
     
    , Jul 9, 2009
    #5
  6. On Thu, 09 Jul 2009 14:09:13 -0700, sln wrote:

    > Why can't windows do unix?


    In this case, it can. Install cygwin.

    HTH,
    M4
     
    Martijn Lievaart, Jul 10, 2009
    #6
  7. Martijn Lievaart <> wrote:
    >On Thu, 09 Jul 2009 14:09:13 -0700, sln wrote:
    >
    >> Why can't windows do unix?


    "Why can't a Ford do a Chevy?"

    >In this case, it can. Install cygwin.


    That doesn't "do Unix" (whatever that is supposed to mean).
    It merely provides the typical Unix utilities in the Windows
    environment.

    jue
     
    Jürgen Exner, Jul 10, 2009
    #7
  8. Cibalo

    Guest

    On Fri, 10 Jul 2009 14:34:52 -0700, Jürgen Exner <> wrote:

    >Martijn Lievaart <> wrote:
    >>On Thu, 09 Jul 2009 14:09:13 -0700, sln wrote:
    >>
    >>> Why can't windows do unix?

    >
    >"Why can't a Ford do a Chevy?"


    In this case, the only thing a 'Chevy' can do is Camaro.
    Ford can do anything. Buy a horse (Mustang).

    >
    >>In this case, it can. Install cygwin.

    >
    >That doesn't "do Unix" (whatever that is supposed to mean).


    Why does Unix do /dir/dir/dir/not_dir (whatever that means), and why forward slashes?
    Is /dir/dir/dir/not_\dir available?

    >It merely provides the typical Unix utilities in the Windows
    >environment.


    This means a compiler right?

    >
    >jue


    -sln
     
    , Jul 13, 2009
    #8
  9. On Mon, 13 Jul 2009 13:36:32 -0700, sln wrote:

    >>It merely provides the typical Unix utilities in the Windows
    >>environment.

    >
    > This means a compiler right?


    No it means binaries (and in typical unix tradition, also a compiler,
    it's one of the binaries).

    M4
     
    Martijn Lievaart, Jul 17, 2009
    #9
  10. Cibalo

    Guest

    On Fri, 17 Jul 2009 22:24:19 +0200, Martijn Lievaart <> wrote:

    >On Mon, 13 Jul 2009 13:36:32 -0700, sln wrote:
    >
    >>>It merely provides the typical Unix utilities in the Windows
    >>>environment.

    >>
    >> This means a compiler right?

    >
    >No it means binaries (and in typical unix tradition, also a compiler,
    >it's one of the binaries).
    >
    >M4


    Since I have to learn everything on my own (because class is too slow),
    they (an employer) would have to pay me (unix deliverables) while I am
    forced to learn. To shift to different OSs' all the time takes a lot
    out of me. I can deliver unix code with a compiler that keeps me in line.
    I'm so lazy I make the compiler do my work. Make it tell me my errors,
    take me to my errors, take me to the docs, make it fix it for me.
    IDE's are my slave's, they get out of line ... I pop em in the mout'

    -sln
     
    , Jul 17, 2009
    #10
  11. Cibalo

    Guest

    On Fri, 17 Jul 2009 16:21:46 -0500, l v <> wrote:

    > wrote:
    >
    >> Why does Unix do /dir/dir/dir/not_dir (whatever that means), and why forward slashes?
    >> Is /dir/dir/dir/not_\dir available?
    >>

    >
    >Windows uses back slashes while unix uses forward slashes. A mainframe
    >uses periods (.).
    >
    >Therefore unix's /dir/dir/dir/not_dir
    > is windows c:\dir\dir\dir\file
    >
    >In your Perl code you should use forward slashed even when on windows.
    >For example:
    >
    >open FH, '<', 'c:/dir/dir/dir/file') or die ........
    >
    >
    >Although not_dir does not mean a file for unix. For example, not_dir can
    >be a link to another file or directory. But I won't go into that here.


    Hey thanks! I already had an idea 'not_dir can be a link to another file or directory',
    but I didn't go down that path when I scanned that line somewhere.

    The ///// slashes are a Perl comfort thing, unfortunately, intrinsic separators asigned
    to my $sep are platform useless given the former thanks. But, OS normalization is, like
    you said, maybe not guaranteed. I just hate OS'.

    -sln
     
    , Jul 17, 2009
    #11
  12. Cibalo

    Guest

    On Fri, 17 Jul 2009 16:13:55 -0500, l v <> wrote:

    > wrote:
    >> On Wed, 08 Jul 2009 01:27:10 -0700, "John W. Krahn" <> wrote:
    >>
    >>> Cibalo wrote:
    >>>> I would like to list all the 5-digit zip codes in my database, of
    >>>> which a line may contain more than one zip codes. Then, I create a
    >>>> test database, testdb, for testing as follows.
    >>>>
    >>>> # echo -e 'zip1 10036; zip2 48226; zip3 94128\nzip4 V8Y 1L1; zip5
    >>>> 400069\nzip6 \nzip7 12345' > testdb
    >>>> # cat testdb
    >>>> zip1 10036; zip2 48226; zip3 94128
    >>>> zip4 V8Y 1L1; zip5 400069
    >>>> zip6
    >>>> zip7 12345
    >>>> # perl -wnl -e '/\b[0-9]{5}\b/g and print "$.: $&";' testdb
    >>>> 1: 10036
    >>>> 4: 12345
    >>>> # grep -now -e '[0-9]\{5\}' testdb
    >>>> 1:10036
    >>>> 48226
    >>>> 94128
    >>>> 4:12345
    >>>> #
    >>>>
    >>>> Even with the global modifier, the above perl script lists only the
    >>>> first pattern match with multiple matches in the same line. But I can
    >>>> make it worked with grep as listed above.
    >>> The problem is that even with the global option the pattern is evaluated
    >>> in scalar context and so will only match once. You need to either match
    >>> in list context:
    >>>
    >>> $ echo "zip1 10036; zip2 48226; zip3 94128
    >>> zip4 V8Y 1L1; zip5 400069
    >>> zip6
    >>> zip7 12345
    >>> " | perl -lne'print "$.: $_" for /\b[0-9]{5}\b/g'
    >>> 1: 10036
    >>> 1: 48226
    >>> 1: 94128
    >>> 4: 12345
    >>>
    >>>
    >>> Or match all patterns in scalar context:
    >>>
    >>> $ echo "zip1 10036; zip2 48226; zip3 94128
    >>> zip4 V8Y 1L1; zip5 400069
    >>> zip6
    >>> zip7 12345
    >>> " | perl -lne'print "$.: $1" while /\b([0-9]{5})\b/g'

    >> ^^^^^^^^^^^^^^^^^^^^^^^
    >>> 1: 10036
    >>> 1: 48226
    >>> 1: 94128
    >>> 4: 12345
    >>>
    >>>
    >>>
    >>> John

    >>
    >> I always enjoy (and marvel at) seeing Unix 1 liner shell
    >> compositions here. Seems so at ease and natural. I just got Windyo'z.
    >> When I cut and paste these 1 liners (even though my shell does 'echo')
    >> each line is treated as a new command, even when I batch it.
    >> Unfortunately, the {'"} syntax is also different under Windows (and I
    >> have XP, the great).
    >>

    >
    >You just need to adjust the one-liner a bit to make the unix one-liners
    > work on windows.
    >
    >unix:
    >$ echo "zip1 10036; zip2 48226; zip3 94128" | perl -lne'print "$.: $1"
    >while /\b([0-9]{5})\b/g'
    >1: 10036
    >1: 48226
    >1: 94128
    >
    >
    >windows:
    >d:\>echo "zip1 10036; zip2 48226; zip3 94128" | perl -nle "print qq($.:
    >$1) while /\b([0-9]{5})\b/g"
    >1: 10036
    >1: 48226
    >1: 94128
    >
    >I tend to use the qq() syntax when I need a double-quote in the
    >one-liner for windows.


    Thanks Len, I really appretiate that!

    -sln
     
    , Jul 17, 2009
    #12
  13. l v <> wrote:
    > wrote:
    >
    >> Why does Unix do /dir/dir/dir/not_dir (whatever that means), and why forward slashes?


    Wrong question. Question should have been "Why does Windows not use
    forward slashes?" After all Unix predates Windows by a decade and a
    half.

    >> Is /dir/dir/dir/not_\dir available?


    Sure, why not?
    I'm not absoluely certain but I thing this should be the same name as
    just not_dir. The \d is not a special character (unlike \t or \r),
    therefore the escape should be ignored.

    jue
     
    Jürgen Exner, Jul 17, 2009
    #13
  14. Cibalo

    Guest

    On Fri, 17 Jul 2009 15:03:43 -0700, Jürgen Exner <> wrote:

    >l v <> wrote:
    >> wrote:
    >>
    >>> Why does Unix do /dir/dir/dir/not_dir (whatever that means), and why forward slashes?

    >
    >Wrong question. Question should have been "Why does Windows not use
    >forward slashes?" After all Unix predates Windows by a decade and a
    >half.


    I still have Unix programming Manual's 1 & 2 by Bell Labratories sitting
    in my book case (dark blue-green). I can probably still use them, huh?
    What year did you say that was?

    Seems since (mostly) the begining, unix had to be compiled with the features
    you wanted. Was it all source available or dlls as well?

    Its a good thing you don't have to compile Windowz, anything goes wrong, all
    you have to do is blame Microsoft, the winner (or weenier)!

    Slash unix AND windoz.

    -sln
     
    , Jul 17, 2009
    #14
  15. On 2009-07-17 22:03, Jürgen Exner <> wrote:
    > l v <> wrote:
    >> wrote:
    >>
    >>> Why does Unix do /dir/dir/dir/not_dir (whatever that means), and why forward slashes?

    >
    > Wrong question. Question should have been "Why does Windows not use
    > forward slashes?" After all Unix predates Windows by a decade and a
    > half.


    Easy to answer: MS-DOS 1.0 had inherited the forward slash as an option
    marker from CP/M. MS-DOS 2.0 added a lot of Unix features (like a
    filedescriptor-based I/O API and a hierarchical file system) but they
    didn't want an incompatible change to the CLI. So the slash remained the
    option marker and the (previously unused) backslash became the directory
    separator. But there was a "switchar" (sic!) system call, which could be
    used to set and query the current switch (=option) character. All
    Microsoft and many third party utilities used this, so you could set the
    option character to '-' and then use commands like:

    dir -w c:/foo

    Regardless of this setting, the system calls always accepted both the
    slash and the backslash as a directory separator (and that's still the
    case in Windows).

    hp
     
    Peter J. Holzer, Jul 17, 2009
    #15
  16. Cibalo

    Uri Guttman Guest

    >>>>> "PJH" == Peter J Holzer <> writes:

    PJH> Easy to answer: MS-DOS 1.0 had inherited the forward slash as an
    PJH> option marker from CP/M. MS-DOS 2.0 added a lot of Unix features
    PJH> (like a filedescriptor-based I/O API and a hierarchical file

    you have to go back even farther than that. cp/m was derived from dec's
    RT-11 which has / for option markers. and most dec OS's did that too.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Free Perl Training --- http://perlhunter.com/college.html ---------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Jul 18, 2009
    #16
  17. Ben Morrow <> wrote:
    >Quoth Jürgen Exner <>:
    >> l v <> wrote:
    >> > wrote:
    >> >
    >> >> Why does Unix do /dir/dir/dir/not_dir (whatever that means), and why

    >> forward slashes?
    >>
    >> Wrong question. Question should have been "Why does Windows not use
    >> forward slashes?" After all Unix predates Windows by a decade and a
    >> half.

    [...]
    >Not every Windows incompatibility with Unix is stupid: they are simply
    >different OSen with rather different histories and influences.


    Fair enough. And certainly true.

    But how dare you adding reason to an argument about the best editor,
    errrm, best OS, errrrm , longest ..... :)

    jue
     
    Jürgen Exner, Jul 18, 2009
    #17
  18. Cibalo

    Guest

    On Fri, 17 Jul 2009 19:45:03 -0400, Uri Guttman <> wrote:

    >>>>>> "PJH" == Peter J Holzer <> writes:

    >
    > PJH> Easy to answer: MS-DOS 1.0 had inherited the forward slash as an
    > PJH> option marker from CP/M. MS-DOS 2.0 added a lot of Unix features
    > PJH> (like a filedescriptor-based I/O API and a hierarchical file
    >
    >you have to go back even farther than that. cp/m was derived from dec's
    >RT-11 which has / for option markers. and most dec OS's did that too.
    >
    >uri


    Then, the guy who did Dec, did Windowz.

    -sln
     
    , Jul 18, 2009
    #18
  19. l v <> writes:
    [...]
    > Windows uses back slashes while unix uses forward slashes. A
    > mainframe uses periods (.).


    That depends on the mainframe, and which OS it's running.

    > Therefore unix's /dir/dir/dir/not_dir
    > is windows c:\dir\dir\dir\file
    >
    > In your Perl code you should use forward slashed even when on
    > windows. For example:
    >
    > open FH, '<', 'c:/dir/dir/dir/file') or die ........


    Why? I mean, I'm aware that it will work, but what's the real
    advantage of using '/' rather than '\' on Windows?

    One *small* advantage is that you don't have worry about escaping
    backslashes in double-quoted strings. (The solution: Remember to
    escape the backslashes.)

    But I can think of two disadvantages. One is that the string might be
    passed to the command processor at some point. Another is that it
    might be displayed to the user, and most Windows users probably don't
    know that '/' is a valid directory delimiter.

    If you're hardwiring file paths like C:\dir\file.txt, you're writing
    Windows-specific code anyway. Why not use the form that's most
    natural for Windows? (Or, better yet, don't hardwire paths in your
    script.)

    [...]

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Jul 18, 2009
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Boris Pelakh
    Replies:
    3
    Views:
    474
    Purl Gurl
    Apr 8, 2004
  2. Replies:
    3
    Views:
    1,578
  3. Replies:
    6
    Views:
    483
    Richard Heathfield
    Dec 3, 2005
  4. dayo
    Replies:
    11
    Views:
    359
    Ilya Zakharevich
    Dec 16, 2005
  5. nadabadan

    using regex to select line matches

    nadabadan, Jun 15, 2007, in forum: Perl Misc
    Replies:
    2
    Views:
    130
    Gunnar Hjalmarsson
    Jun 15, 2007
Loading...

Share This Page