reg expression with input line

S

sam

Hi,

I would like to write a perl script to parse each line read from a text
file.
I ended up some perl code as shown below:

($prodcode,$custname,$qty,$cost,$date,$prodname) =
/^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
+([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,
"12031361 ABC3 567.00
5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5\xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
xbch)\xa4\xe9\xa5\xce12x20's";

print "Result:
".$prodcode.",".$custname.",".$qty.",".$cost.",".$date.",".$prodname . "\n";

if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
$date eq "" or $prodname eq "") {
print "Failed to parse input file.\n";
exit;
}

But the parser failed to parse the input text, it returns empty string.
What is wrong with the above code, especially the parser I created for
parsing the $date.

Thanks
Sam
 
A

Arndt Jonasson

sam said:
I would like to write a perl script to parse each line read from a
text file.
I ended up some perl code as shown below:

($prodcode,$custname,$qty,$cost,$date,$prodname) =
/^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
+([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,
"12031361 ABC3 567.00
5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5\xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
xbch)\xa4\xe9\xa5\xce12x20's";

print "Result:
".$prodcode.",".$custname.",".$qty.",".$cost.",".$date.",".$prodname
. "\n";

if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
$date eq "" or $prodname eq "") {
print "Failed to parse input file.\n";
exit;
}

But the parser failed to parse the input text, it returns empty string.
What is wrong with the above code, especially the parser I created for
parsing the $date.

To begin with, you should ask perl for warnings, either with the -w
option, or with the directive "use warnings;". Then it will tell you
that you get uninitialized values on the "print" line. Your test already
shows that, but you will see that in fact all variables are uninitialized
(meaning their value is 'undef').

It also tells you "Useless use of a constant in void context". It points
out the line where the statement starts, not the place where the constant
starts, but there is only one constant here anyway, and it's the data
string.

The immediate suspicion is that
($var) = /regexp/, "string";
may not be the way to ask perl to match a string with a regexp. And
it isn't. Look it up and you'll see that it is
($var) = "string" =~ /regexp/;

Now that still won't work, because you only get a list from a regexp
if you ask for all matches, which you do with the 'g' modifier. So
you want
($var) = "string" =~ /regexp/g;

The parenthesized items in your regexp match their counterpart in the
string, so after rewriting as I described, it will work.


I don't see much of a parser to parse $date. [0-9]+ seems to work here
for extracting that part of the string, as long as you're sure that
the first following character is not a digit. You can use \d instead
of [0-9], it means the same thing.
 
A

Anno Siegel

sam said:
Hi,

I would like to write a perl script to parse each line read from a text
file.
I ended up some perl code as shown below:

($prodcode,$custname,$qty,$cost,$date,$prodname) =
/^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
+([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,

Up to here, it looks like a regex of sorts, but what is this:
"12031361 ABC3 567.00
5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5\xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
xbch)\xa4\xe9\xa5\xce12x20's";
print "Result:
".$prodcode.",".$custname.",".$qty.",".$cost.",".$date.",".$prodname . "\n";

Use string interpolation, not concatenation if there are lots of
variables. Better yet, collect the result in an array @data, then
say

print "Result: ", join( ',', @data), "\n";
if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
$date eq "" or $prodname eq "") {
print "Failed to parse input file.\n";
exit;
}

....and this could be written

print "Failed to parse input file.\n" if grep length() == 0, @data;
But the parser failed to parse the input text, it returns empty string.
What is wrong with the above code, especially the parser I created for
parsing the $date.

Which part of the regex is supposed to parse a date, and in what format?
What does the input data look like anyway? It's probably possible to
infer that from the (mangled) code you've given, but I'm not going to.

Anno
 
A

Anno Siegel

[...]
($var) = "string" =~ /regexp/;

Now that still won't work, because you only get a list from a regexp
if you ask for all matches, which you do with the 'g' modifier. So

That is not true. /g is only needed when the regex doesn't capture
anything. If it does, the captures will be delivered in list context.

Anno
 
A

Arndt Jonasson

[...]
($var) = "string" =~ /regexp/;

Now that still won't work, because you only get a list from a regexp
if you ask for all matches, which you do with the 'g' modifier. So

That is not true. /g is only needed when the regex doesn't capture
anything. If it does, the captures will be delivered in list context.

Oops. I'm sorry for being misleading. Clearly described in the regexp
section, too...
 
B

Brian McCauley

sam said:
I ended up some perl code as shown below:

($prodcode,$custname,$qty,$cost,$date,$prodname) =
/^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
+([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,
"12031361 ABC3 567.00
5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5\xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\

xbch)\xa4\xe9\xa5\xce12x20's";

What are you expecting the comma operator in the above code to do?
Where did you get this expectation? Compare your expectation to what
comma actually does (RTFM). Compare it also to the =~ operator which
does do what I'm guessing you think the comma does, but it's operands
are the other way around.

You should always compile Perl with strictures and warnings enabled.
Perl would then have told you something was wrong.

You should always delare all variables as lexically scoped in the
smallest applicable scope. This means there's a 95% chance that you
should have had a my() in there.
print "Result:
".$prodcode.",".$custname.",".$qty.",".$cost.",".$date.",".$prodname .
"\n";

Why have you obfucated this?

print "Result: $prodcode,$custname,$qty,$cost,$date,$prodname\n";
if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
$date eq "" or $prodname eq "") {
print "Failed to parse input file.\n";
exit;
}

There is no way any of those variables except $prodname can be an empty
string. If the match succedes then all the others must all be non-empty
as none of the other captures could match the empty string. If the
match failed then all the variables will be undefined. Although (undef
eq '') is true it makes your code clearer if you test definedness with
defined(). (Also it avoids a warning). It is also only necessary to
check the definedness of one of the variables. Better still just use
the return value of the list assignment statement that will be true if
the match succeded.
But the parser failed to parse the input text, it returns empty string.

This is nonsense there is no return value from your code.
What is wrong with the above code, especially the parser I created for
parsing the $date.

The parser you created for parsing $date was not included in the code
you posted so we can't possbily comment.

[ Please excuse the line-wrap damage in the following ]

#!/usr/bin/perl
use strict;
use warnings;

$_= "12031361 ABC3 567.00
5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5\xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\xbch)\xa4\xe9\xa5\xce12x20's";

if ( my($prodcode,$custname,$qty,$cost,$date,$prodname) =
/^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
+([0-9]+\.[0-9][0-9])([0-9]+)(.*)/ ) {

print "Result: $prodcode,$custname,$qty,$cost,$date,$prodname\n";
} else {
print "Failed to parse input file.\n";
exit;
}
 
S

sam

Anno said:
sam said:
Hi,

I would like to write a perl script to parse each line read from a text
file.
I ended up some perl code as shown below:

($prodcode,$custname,$qty,$cost,$date,$prodname) =
/^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
+([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,


Up to here, it looks like a regex of sorts, but what is this:

"12031361 ABC3 567.00
5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5\xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
xbch)\xa4\xe9\xa5\xce12x20's";

print "Result:
".$prodcode.",".$custname.",".$qty.",".$cost.",".$date.",".$prodname . "\n";


Use string interpolation, not concatenation if there are lots of
variables. Better yet, collect the result in an array @data, then
say

print "Result: ", join( ',', @data), "\n";

if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
$date eq "" or $prodname eq "") {
print "Failed to parse input file.\n";
exit;
}


...and this could be written

print "Failed to parse input file.\n" if grep length() == 0, @data;
Thanks very much. This is very helpful indeed.

Thanks
Sam
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top