Help with regexp. Can you do better

Discussion in 'Perl Misc' started by papaDoc, Sep 27, 2005.

  1. papaDoc

    papaDoc Guest

    Hi,

    I'm trying to parse the output of CVS loginfo to update the script
    activitymail and I have a problem.

    I'm able to parse the line but I don't like the way I'm doing it.
    Can you help me get rid of the $delim variable which I need in my
    current algo ?

    I want to get a list like this
    @dest[0] = "excavator_resources.h 1.129 1.130.23"
    @dest[0] = "gfxGround.cpp 1.12 1.13"
    etc

    #!C:/DevTools/mks/mksnt/perl.exe
    #

    $src = "excavator_resources.h 1.129 1.130.23 gfxGround.cpp 1.12 1.13
    mgrDemo.cpp 1.72 1.73 objExcavator.cpp 1.42 1.43 pedModule_Digging.cpp
    1.25 1.26 pedModule_DumpTrench.cpp 1.18 1.19 pedModule_DumpTruck.cpp
    1.27 1.28 pedModule_TargetSearch.cpp 1.17 1.18 pedTerrainDef.cpp 1.6
    1.7 pedTrenchDef.cpp 1.18 1.19 pedTrial_DumpTruck.cpp 1.28 1.29
    playback.cpp 1.1 1.2 .configrc 1.2 1.3 ~configrc 1.2 1.3";

    $delim =
    "This-Is-A-Simlog-Delimiter-And-No-Filenames-Should-Be-Called-This";

    $src =~ s/([\d\.]+\s[\d\.]+)\s(\S)/\1$delim\2/g;
    @dest = split( /$delim/ , $src);

    print "\n";
    foreach $d (@dest)
    {
    print "($d)\n";
    }
    print "\n";


    Remi
     
    papaDoc, Sep 27, 2005
    #1
    1. Advertising

  2. papaDoc

    Paul Lalli Guest

    papaDoc wrote:
    > Hi,
    >
    > I'm trying to parse the output of CVS loginfo to update the script
    > activitymail and I have a problem.
    >
    > I'm able to parse the line but I don't like the way I'm doing it.
    > Can you help me get rid of the $delim variable which I need in my
    > current algo ?
    >
    > I want to get a list like this
    > @dest[0] = "excavator_resources.h 1.129 1.130.23"
    > @dest[0] = "gfxGround.cpp 1.12 1.13"


    I have no idea what a "list like that" would be. I assume you meant
    $dest[0] for the first line, and $dest[1] for the second.

    perldoc -q "difference" (scroll down a bit)

    > $src = "excavator_resources.h 1.129 1.130.23 gfxGround.cpp 1.12 1.13
    > mgrDemo.cpp 1.72 1.73 objExcavator.cpp 1.42 1.43 pedModule_Digging.cpp
    > 1.25 1.26 pedModule_DumpTrench.cpp 1.18 1.19 pedModule_DumpTruck.cpp
    > 1.27 1.28 pedModule_TargetSearch.cpp 1.17 1.18 pedTerrainDef.cpp 1.6
    > 1.7 pedTrenchDef.cpp 1.18 1.19 pedTrial_DumpTruck.cpp 1.28 1.29
    > playback.cpp 1.1 1.2 .configrc 1.2 1.3 ~configrc 1.2 1.3";
    >
    > $delim =
    > "This-Is-A-Simlog-Delimiter-And-No-Filenames-Should-Be-Called-This";
    >
    > $src =~ s/([\d\.]+\s[\d\.]+)\s(\S)/\1$delim\2/g;


    use warnings; would have told you to use $1 and $2 rather than \1 and
    \2 there.

    > @dest = split( /$delim/ , $src);


    I have no idea why you're jumping through these hoops. Are you aware
    that a pattern match in list context returns a list of the matches?

    my @dest = $src =~ /(\S+\s[\d\.]+\s[\d\.]+)/g;
    Match all instances of: one or more non-whitespace, a single
    whitespace, one or more (decmial or digit), a whitespace, and another
    one or more (decimal or digit).

    Paul Lalli
     
    Paul Lalli, Sep 27, 2005
    #2
    1. Advertising

  3. papaDoc

    Dave Weaver Guest

    On 27 Sep 2005 05:24:48 -0700, papaDoc <> wrote:
    > Hi,
    >
    > I'm trying to parse the output of CVS loginfo to update the script
    > activitymail and I have a problem.
    >
    > I'm able to parse the line but I don't like the way I'm doing it.
    > Can you help me get rid of the $delim variable which I need in my
    > current algo ?
    >
    > I want to get a list like this
    > @dest[0] = "excavator_resources.h 1.129 1.130.23"
    > @dest[0] = "gfxGround.cpp 1.12 1.13"
    > etc
    >


    How about:

    #!/usr/bin/perl
    use warnings;
    use strict;

    my $src = "excavator_resources.h 1.129 1.130.23 gfxGround.cpp 1.12 1.13
    mgrDemo.cpp 1.72 1.73 objExcavator.cpp 1.42 1.43 pedModule_Digging.cpp
    1.25 1.26 pedModule_DumpTrench.cpp 1.18 1.19 pedModule_DumpTruck.cpp
    1.27 1.28 pedModule_TargetSearch.cpp 1.17 1.18 pedTerrainDef.cpp 1.6
    1.7 pedTrenchDef.cpp 1.18 1.19 pedTrial_DumpTruck.cpp 1.28 1.29
    playback.cpp 1.1 1.2 .configrc 1.2 1.3 ~configrc 1.2 1.3";

    my @dest = $src =~ /(.*?\s[\d\.]+\s[\d\.]+)\s?/g;

    use Data::Dumper;
    print Dumper \@dest;
     
    Dave Weaver, Sep 27, 2005
    #3
  4. papaDoc

    Guest

    papaDoc wrote:
    > Hi,
    >
    > I'm trying to parse the output of CVS loginfo to update the script
    > activitymail and I have a problem.
    >
    > I'm able to parse the line but I don't like the way I'm doing it.
    > Can you help me get rid of the $delim variable which I need in my
    > current algo ?
    >
    > I want to get a list like this
    > @dest[0] = "excavator_resources.h 1.129 1.130.23"
    > @dest[0] = "gfxGround.cpp 1.12 1.13"
    > etc


    Nitpickey but @dest[0] is better as $dest[0] (until Perl 6)

    >
    > #!C:/DevTools/mks/mksnt/perl.exe
    > #
    >
    > $src = "excavator_resources.h 1.129 1.130.23 gfxGround.cpp 1.12 1.13
    > mgrDemo.cpp 1.72 1.73 objExcavator.cpp 1.42 1.43 pedModule_Digging.cpp
    > 1.25 1.26 pedModule_DumpTrench.cpp 1.18 1.19 pedModule_DumpTruck.cpp
    > 1.27 1.28 pedModule_TargetSearch.cpp 1.17 1.18 pedTerrainDef.cpp 1.6
    > 1.7 pedTrenchDef.cpp 1.18 1.19 pedTrial_DumpTruck.cpp 1.28 1.29
    > playback.cpp 1.1 1.2 .configrc 1.2 1.3 ~configrc 1.2 1.3";
    >
    > $delim =
    > "This-Is-A-Simlog-Delimiter-And-No-Filenames-Should-Be-Called-This";
    >
    > $src =~ s/([\d\.]+\s[\d\.]+)\s(\S)/\1$delim\2/g;
    > @dest = split( /$delim/ , $src);
    >
    > print "\n";
    > foreach $d (@dest)
    > {
    > print "($d)\n";
    > }
    > print "\n";
    >
    >


    Here're a couple:

    @dest = $src =~ /(\S+\s+[\d.?]+\s+[\d.?]+\s*)/g;

    the [\d.] doesn't force order so this might be slightly preferable
    although not totally right:

    @dest = $src =~ /(\S+ # group starting with non-whitespace
    \s+ # followed by whitespace
    (?:\d\.?){1,} # non-capturing: digit and period (1or
    more)
    \s+ # followed by whitespace
    (?:\d\.?){1,} # non-capturing: digit and period (1or
    more)
    \s* # whitespace (0 or more since none at
    end)
    ) # end grouping
    /xg;

    Output:

    excavator_resources.h 1.129 1.130.23
    gfxGround.cpp 1.12 1.13
    mgrDemo.cpp 1.72 1.73
    objExcavator.cpp 1.42 1.43
    pedModule_Digging.cpp 1.25 1.26
    pedModule_DumpTrench.cpp 1.18 1.19
    pedModule_DumpTruck.cpp 1.27 1.28
    pedModule_TargetSearch.cpp 1.17 1.18
    pedTerrainDef.cpp 1.6 1.7
    pedTrenchDef.cpp 1.18 1.19
    pedTrial_DumpTruck.cpp 1.28 1.29
    playback.cpp 1.1 1.2
    .configrc 1.2 1.3
    ~configrc 1.2 1.3

    hth,
    --
    Charles DeRykus
     
    , Sep 27, 2005
    #4
  5. papaDoc

    Paul Lalli Guest

    wrote:
    > @dest = $src =~ /(\S+\s+[\d.?]+\s+[\d.?]+\s*)/g;

    ^^^^^^ ^^^^^^

    This doesn't mean what you think it means. ? is not special in a
    character class. Each of those is searching for one or more digits,
    periods, or question marks.

    Paul Lalli
     
    Paul Lalli, Sep 27, 2005
    #5
  6. papaDoc

    Guest

    Paul Lalli wrote:
    > wrote:
    > > @dest = $src =~ /(\S+\s+[\d.?]+\s+[\d.?]+\s*)/g;

    > ^^^^^^ ^^^^^^
    >
    > This doesn't mean what you think it means. ? is not special in a
    > character class. Each of those is searching for one or more digits,
    > periods, or question marks.


    Right, I must've been thinking ahead to the class-less alternative
    I suggested.

    --
    Charles DeRykus
     
    , Sep 27, 2005
    #6
  7. Paul Lalli wrote:

    > my @dest = $src =~ /(\S+\s[\d\.]+\s[\d\.]+)/g;
    > Match all instances of: one or more non-whitespace, a single
    > whitespace, one or more (decmial or digit), a whitespace, and another
    > one or more (decimal or digit).


    It's not necessary to escape . in a character class:

    my @dest = $src =~ /(\S+\s[\d.]+\s[\d.]+)/g;
     
    William James, Sep 28, 2005
    #7
  8. papaDoc

    Anno Siegel Guest

    papaDoc <> wrote in comp.lang.perl.misc:
    > Hi,
    >
    > I'm trying to parse the output of CVS loginfo to update the script
    > activitymail and I have a problem.
    >
    > I'm able to parse the line but I don't like the way I'm doing it.
    > Can you help me get rid of the $delim variable which I need in my
    > current algo ?
    >
    > I want to get a list like this
    > @dest[0] = "excavator_resources.h 1.129 1.130.23"
    > @dest[0] = "gfxGround.cpp 1.12 1.13"
    > etc
    >
    > #!C:/DevTools/mks/mksnt/perl.exe
    > #
    >
    > $src = "excavator_resources.h 1.129 1.130.23 gfxGround.cpp 1.12 1.13
    > mgrDemo.cpp 1.72 1.73 objExcavator.cpp 1.42 1.43 pedModule_Digging.cpp
    > 1.25 1.26 pedModule_DumpTrench.cpp 1.18 1.19 pedModule_DumpTruck.cpp
    > 1.27 1.28 pedModule_TargetSearch.cpp 1.17 1.18 pedTerrainDef.cpp 1.6
    > 1.7 pedTrenchDef.cpp 1.18 1.19 pedTrial_DumpTruck.cpp 1.28 1.29
    > playback.cpp 1.1 1.2 .configrc 1.2 1.3 ~configrc 1.2 1.3";
    >
    > $delim =
    > "This-Is-A-Simlog-Delimiter-And-No-Filenames-Should-Be-Called-This";
    >
    > $src =~ s/([\d\.]+\s[\d\.]+)\s(\S)/\1$delim\2/g;
    > @dest = split( /$delim/ , $src);
    >
    > print "\n";
    > foreach $d (@dest)
    > {
    > print "($d)\n";
    > }
    > print "\n";


    Split on blanks that are followed by a non-digit:

    my @dest = split / (?=\D)/, $src;

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Sep 28, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Richard Dixson
    Replies:
    1
    Views:
    524
    Joe Fallon
    May 18, 2004
  2. Greg Hurrell
    Replies:
    4
    Views:
    167
    James Edward Gray II
    Feb 14, 2007
  3. Mikel Lindsaar
    Replies:
    0
    Views:
    512
    Mikel Lindsaar
    Mar 31, 2008
  4. Joao Silva
    Replies:
    16
    Views:
    383
    7stud --
    Aug 21, 2009
  5. Uldis  Bojars
    Replies:
    2
    Views:
    200
    Janwillem Borleffs
    Dec 17, 2006
Loading...

Share This Page