Help with regexp. Can you do better

P

papaDoc

Hi,

I'm trying to parse the output of CVS loginfo to update the script
activitymail and I have a problem.

I'm able to parse the line but I don't like the way I'm doing it.
Can you help me get rid of the $delim variable which I need in my
current algo ?

I want to get a list like this
@dest[0] = "excavator_resources.h 1.129 1.130.23"
@dest[0] = "gfxGround.cpp 1.12 1.13"
etc

#!C:/DevTools/mks/mksnt/perl.exe
#

$src = "excavator_resources.h 1.129 1.130.23 gfxGround.cpp 1.12 1.13
mgrDemo.cpp 1.72 1.73 objExcavator.cpp 1.42 1.43 pedModule_Digging.cpp
1.25 1.26 pedModule_DumpTrench.cpp 1.18 1.19 pedModule_DumpTruck.cpp
1.27 1.28 pedModule_TargetSearch.cpp 1.17 1.18 pedTerrainDef.cpp 1.6
1.7 pedTrenchDef.cpp 1.18 1.19 pedTrial_DumpTruck.cpp 1.28 1.29
playback.cpp 1.1 1.2 .configrc 1.2 1.3 ~configrc 1.2 1.3";

$delim =
"This-Is-A-Simlog-Delimiter-And-No-Filenames-Should-Be-Called-This";

$src =~ s/([\d\.]+\s[\d\.]+)\s(\S)/\1$delim\2/g;
@dest = split( /$delim/ , $src);

print "\n";
foreach $d (@dest)
{
print "($d)\n";
}
print "\n";


Remi
(e-mail address removed)
 
P

Paul Lalli

papaDoc said:
Hi,

I'm trying to parse the output of CVS loginfo to update the script
activitymail and I have a problem.

I'm able to parse the line but I don't like the way I'm doing it.
Can you help me get rid of the $delim variable which I need in my
current algo ?

I want to get a list like this
@dest[0] = "excavator_resources.h 1.129 1.130.23"
@dest[0] = "gfxGround.cpp 1.12 1.13"

I have no idea what a "list like that" would be. I assume you meant
$dest[0] for the first line, and $dest[1] for the second.

perldoc -q "difference" (scroll down a bit)
$src = "excavator_resources.h 1.129 1.130.23 gfxGround.cpp 1.12 1.13
mgrDemo.cpp 1.72 1.73 objExcavator.cpp 1.42 1.43 pedModule_Digging.cpp
1.25 1.26 pedModule_DumpTrench.cpp 1.18 1.19 pedModule_DumpTruck.cpp
1.27 1.28 pedModule_TargetSearch.cpp 1.17 1.18 pedTerrainDef.cpp 1.6
1.7 pedTrenchDef.cpp 1.18 1.19 pedTrial_DumpTruck.cpp 1.28 1.29
playback.cpp 1.1 1.2 .configrc 1.2 1.3 ~configrc 1.2 1.3";

$delim =
"This-Is-A-Simlog-Delimiter-And-No-Filenames-Should-Be-Called-This";

$src =~ s/([\d\.]+\s[\d\.]+)\s(\S)/\1$delim\2/g;

use warnings; would have told you to use $1 and $2 rather than \1 and
\2 there.
@dest = split( /$delim/ , $src);

I have no idea why you're jumping through these hoops. Are you aware
that a pattern match in list context returns a list of the matches?

my @dest = $src =~ /(\S+\s[\d\.]+\s[\d\.]+)/g;
Match all instances of: one or more non-whitespace, a single
whitespace, one or more (decmial or digit), a whitespace, and another
one or more (decimal or digit).

Paul Lalli
 
D

Dave Weaver

Hi,

I'm trying to parse the output of CVS loginfo to update the script
activitymail and I have a problem.

I'm able to parse the line but I don't like the way I'm doing it.
Can you help me get rid of the $delim variable which I need in my
current algo ?

I want to get a list like this
@dest[0] = "excavator_resources.h 1.129 1.130.23"
@dest[0] = "gfxGround.cpp 1.12 1.13"
etc

How about:

#!/usr/bin/perl
use warnings;
use strict;

my $src = "excavator_resources.h 1.129 1.130.23 gfxGround.cpp 1.12 1.13
mgrDemo.cpp 1.72 1.73 objExcavator.cpp 1.42 1.43 pedModule_Digging.cpp
1.25 1.26 pedModule_DumpTrench.cpp 1.18 1.19 pedModule_DumpTruck.cpp
1.27 1.28 pedModule_TargetSearch.cpp 1.17 1.18 pedTerrainDef.cpp 1.6
1.7 pedTrenchDef.cpp 1.18 1.19 pedTrial_DumpTruck.cpp 1.28 1.29
playback.cpp 1.1 1.2 .configrc 1.2 1.3 ~configrc 1.2 1.3";

my @dest = $src =~ /(.*?\s[\d\.]+\s[\d\.]+)\s?/g;

use Data::Dumper;
print Dumper \@dest;
 
C

ced

papaDoc said:
Hi,

I'm trying to parse the output of CVS loginfo to update the script
activitymail and I have a problem.

I'm able to parse the line but I don't like the way I'm doing it.
Can you help me get rid of the $delim variable which I need in my
current algo ?

I want to get a list like this
@dest[0] = "excavator_resources.h 1.129 1.130.23"
@dest[0] = "gfxGround.cpp 1.12 1.13"
etc

Nitpickey but @dest[0] is better as $dest[0] (until Perl 6)
#!C:/DevTools/mks/mksnt/perl.exe
#

$src = "excavator_resources.h 1.129 1.130.23 gfxGround.cpp 1.12 1.13
mgrDemo.cpp 1.72 1.73 objExcavator.cpp 1.42 1.43 pedModule_Digging.cpp
1.25 1.26 pedModule_DumpTrench.cpp 1.18 1.19 pedModule_DumpTruck.cpp
1.27 1.28 pedModule_TargetSearch.cpp 1.17 1.18 pedTerrainDef.cpp 1.6
1.7 pedTrenchDef.cpp 1.18 1.19 pedTrial_DumpTruck.cpp 1.28 1.29
playback.cpp 1.1 1.2 .configrc 1.2 1.3 ~configrc 1.2 1.3";

$delim =
"This-Is-A-Simlog-Delimiter-And-No-Filenames-Should-Be-Called-This";

$src =~ s/([\d\.]+\s[\d\.]+)\s(\S)/\1$delim\2/g;
@dest = split( /$delim/ , $src);

print "\n";
foreach $d (@dest)
{
print "($d)\n";
}
print "\n";

Here're a couple:

@dest = $src =~ /(\S+\s+[\d.?]+\s+[\d.?]+\s*)/g;

the [\d.] doesn't force order so this might be slightly preferable
although not totally right:

@dest = $src =~ /(\S+ # group starting with non-whitespace
\s+ # followed by whitespace
(?:\d\.?){1,} # non-capturing: digit and period (1or
more)
\s+ # followed by whitespace
(?:\d\.?){1,} # non-capturing: digit and period (1or
more)
\s* # whitespace (0 or more since none at
end)
) # end grouping
/xg;

Output:

excavator_resources.h 1.129 1.130.23
gfxGround.cpp 1.12 1.13
mgrDemo.cpp 1.72 1.73
objExcavator.cpp 1.42 1.43
pedModule_Digging.cpp 1.25 1.26
pedModule_DumpTrench.cpp 1.18 1.19
pedModule_DumpTruck.cpp 1.27 1.28
pedModule_TargetSearch.cpp 1.17 1.18
pedTerrainDef.cpp 1.6 1.7
pedTrenchDef.cpp 1.18 1.19
pedTrial_DumpTruck.cpp 1.28 1.29
playback.cpp 1.1 1.2
.configrc 1.2 1.3
~configrc 1.2 1.3

hth,
 
P

Paul Lalli

@dest = $src =~ /(\S+\s+[\d.?]+\s+[\d.?]+\s*)/g;
^^^^^^ ^^^^^^

This doesn't mean what you think it means. ? is not special in a
character class. Each of those is searching for one or more digits,
periods, or question marks.

Paul Lalli
 
C

ced

Paul said:
@dest = $src =~ /(\S+\s+[\d.?]+\s+[\d.?]+\s*)/g;
^^^^^^ ^^^^^^

This doesn't mean what you think it means. ? is not special in a
character class. Each of those is searching for one or more digits,
periods, or question marks.

Right, I must've been thinking ahead to the class-less alternative
I suggested.
 
W

William James

Paul said:
my @dest = $src =~ /(\S+\s[\d\.]+\s[\d\.]+)/g;
Match all instances of: one or more non-whitespace, a single
whitespace, one or more (decmial or digit), a whitespace, and another
one or more (decimal or digit).

It's not necessary to escape . in a character class:

my @dest = $src =~ /(\S+\s[\d.]+\s[\d.]+)/g;
 
A

Anno Siegel

papaDoc said:
Hi,

I'm trying to parse the output of CVS loginfo to update the script
activitymail and I have a problem.

I'm able to parse the line but I don't like the way I'm doing it.
Can you help me get rid of the $delim variable which I need in my
current algo ?

I want to get a list like this
@dest[0] = "excavator_resources.h 1.129 1.130.23"
@dest[0] = "gfxGround.cpp 1.12 1.13"
etc

#!C:/DevTools/mks/mksnt/perl.exe
#

$src = "excavator_resources.h 1.129 1.130.23 gfxGround.cpp 1.12 1.13
mgrDemo.cpp 1.72 1.73 objExcavator.cpp 1.42 1.43 pedModule_Digging.cpp
1.25 1.26 pedModule_DumpTrench.cpp 1.18 1.19 pedModule_DumpTruck.cpp
1.27 1.28 pedModule_TargetSearch.cpp 1.17 1.18 pedTerrainDef.cpp 1.6
1.7 pedTrenchDef.cpp 1.18 1.19 pedTrial_DumpTruck.cpp 1.28 1.29
playback.cpp 1.1 1.2 .configrc 1.2 1.3 ~configrc 1.2 1.3";

$delim =
"This-Is-A-Simlog-Delimiter-And-No-Filenames-Should-Be-Called-This";

$src =~ s/([\d\.]+\s[\d\.]+)\s(\S)/\1$delim\2/g;
@dest = split( /$delim/ , $src);

print "\n";
foreach $d (@dest)
{
print "($d)\n";
}
print "\n";

Split on blanks that are followed by a non-digit:

my @dest = split / (?=\D)/, $src;

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top