reg expression with input line

Discussion in 'Perl Misc' started by sam, Dec 23, 2004.

  1. sam

    sam Guest

    Hi,

    I would like to write a perl script to parse each line read from a text
    file.
    I ended up some perl code as shown below:

    ($prodcode,$custname,$qty,$cost,$date,$prodname) =
    /^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
    +([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,
    "12031361 ABC3 567.00
    5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5\xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
    xbch)\xa4\xe9\xa5\xce12x20's";

    print "Result:
    ".$prodcode.",".$custname.",".$qty.",".$cost.",".$date.",".$prodname . "\n";

    if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
    $date eq "" or $prodname eq "") {
    print "Failed to parse input file.\n";
    exit;
    }

    But the parser failed to parse the input text, it returns empty string.
    What is wrong with the above code, especially the parser I created for
    parsing the $date.

    Thanks
    Sam
    sam, Dec 23, 2004
    #1
    1. Advertising

  2. sam <> writes:
    >
    > I would like to write a perl script to parse each line read from a
    > text file.
    > I ended up some perl code as shown below:
    >
    > ($prodcode,$custname,$qty,$cost,$date,$prodname) =
    > /^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
    > +([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,
    > "12031361 ABC3 567.00
    > 5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5\xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
    > xbch)\xa4\xe9\xa5\xce12x20's";
    >
    > print "Result:
    > ".$prodcode.",".$custname.",".$qty.",".$cost.",".$date.",".$prodname
    > . "\n";
    >
    > if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
    > $date eq "" or $prodname eq "") {
    > print "Failed to parse input file.\n";
    > exit;
    > }
    >
    > But the parser failed to parse the input text, it returns empty string.
    > What is wrong with the above code, especially the parser I created for
    > parsing the $date.


    To begin with, you should ask perl for warnings, either with the -w
    option, or with the directive "use warnings;". Then it will tell you
    that you get uninitialized values on the "print" line. Your test already
    shows that, but you will see that in fact all variables are uninitialized
    (meaning their value is 'undef').

    It also tells you "Useless use of a constant in void context". It points
    out the line where the statement starts, not the place where the constant
    starts, but there is only one constant here anyway, and it's the data
    string.

    The immediate suspicion is that
    ($var) = /regexp/, "string";
    may not be the way to ask perl to match a string with a regexp. And
    it isn't. Look it up and you'll see that it is
    ($var) = "string" =~ /regexp/;

    Now that still won't work, because you only get a list from a regexp
    if you ask for all matches, which you do with the 'g' modifier. So
    you want
    ($var) = "string" =~ /regexp/g;

    The parenthesized items in your regexp match their counterpart in the
    string, so after rewriting as I described, it will work.


    I don't see much of a parser to parse $date. [0-9]+ seems to work here
    for extracting that part of the string, as long as you're sure that
    the first following character is not a digit. You can use \d instead
    of [0-9], it means the same thing.
    Arndt Jonasson, Dec 23, 2004
    #2
    1. Advertising

  3. sam

    Anno Siegel Guest

    sam <> wrote in comp.lang.perl.misc:
    > Hi,
    >
    > I would like to write a perl script to parse each line read from a text
    > file.
    > I ended up some perl code as shown below:
    >
    > ($prodcode,$custname,$qty,$cost,$date,$prodname) =
    > /^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
    > +([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,


    Up to here, it looks like a regex of sorts, but what is this:

    > "12031361 ABC3 567.00
    > 5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5\xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
    > xbch)\xa4\xe9\xa5\xce12x20's";


    > print "Result:
    > ".$prodcode.",".$custname.",".$qty.",".$cost.",".$date.",".$prodname . "\n";


    Use string interpolation, not concatenation if there are lots of
    variables. Better yet, collect the result in an array @data, then
    say

    print "Result: ", join( ',', @data), "\n";

    > if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
    > $date eq "" or $prodname eq "") {
    > print "Failed to parse input file.\n";
    > exit;
    > }


    ....and this could be written

    print "Failed to parse input file.\n" if grep length() == 0, @data;

    > But the parser failed to parse the input text, it returns empty string.
    > What is wrong with the above code, especially the parser I created for
    > parsing the $date.


    Which part of the regex is supposed to parse a date, and in what format?
    What does the input data look like anyway? It's probably possible to
    infer that from the (mangled) code you've given, but I'm not going to.

    Anno
    Anno Siegel, Dec 23, 2004
    #3
  4. sam

    Anno Siegel Guest

    Arndt Jonasson <> wrote in comp.lang.perl.misc:

    [...]

    > ($var) = "string" =~ /regexp/;
    >
    > Now that still won't work, because you only get a list from a regexp
    > if you ask for all matches, which you do with the 'g' modifier. So


    That is not true. /g is only needed when the regex doesn't capture
    anything. If it does, the captures will be delivered in list context.

    Anno
    Anno Siegel, Dec 23, 2004
    #4
  5. -berlin.de (Anno Siegel) writes:
    > Arndt Jonasson <> wrote in comp.lang.perl.misc:
    >
    > [...]
    >
    > > ($var) = "string" =~ /regexp/;
    > >
    > > Now that still won't work, because you only get a list from a regexp
    > > if you ask for all matches, which you do with the 'g' modifier. So

    >
    > That is not true. /g is only needed when the regex doesn't capture
    > anything. If it does, the captures will be delivered in list context.


    Oops. I'm sorry for being misleading. Clearly described in the regexp
    section, too...
    Arndt Jonasson, Dec 23, 2004
    #5
  6. sam wrote:

    > I ended up some perl code as shown below:
    >
    > ($prodcode,$custname,$qty,$cost,$date,$prodname) =
    > /^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
    > +([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,
    > "12031361 ABC3 567.00
    > 5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5\xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
    >
    > xbch)\xa4\xe9\xa5\xce12x20's";


    What are you expecting the comma operator in the above code to do?
    Where did you get this expectation? Compare your expectation to what
    comma actually does (RTFM). Compare it also to the =~ operator which
    does do what I'm guessing you think the comma does, but it's operands
    are the other way around.

    You should always compile Perl with strictures and warnings enabled.
    Perl would then have told you something was wrong.

    You should always delare all variables as lexically scoped in the
    smallest applicable scope. This means there's a 95% chance that you
    should have had a my() in there.

    > print "Result:
    > ".$prodcode.",".$custname.",".$qty.",".$cost.",".$date.",".$prodname .
    > "\n";


    Why have you obfucated this?

    print "Result: $prodcode,$custname,$qty,$cost,$date,$prodname\n";

    >
    > if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
    > $date eq "" or $prodname eq "") {
    > print "Failed to parse input file.\n";
    > exit;
    > }


    There is no way any of those variables except $prodname can be an empty
    string. If the match succedes then all the others must all be non-empty
    as none of the other captures could match the empty string. If the
    match failed then all the variables will be undefined. Although (undef
    eq '') is true it makes your code clearer if you test definedness with
    defined(). (Also it avoids a warning). It is also only necessary to
    check the definedness of one of the variables. Better still just use
    the return value of the list assignment statement that will be true if
    the match succeded.

    > But the parser failed to parse the input text, it returns empty string.


    This is nonsense there is no return value from your code.

    > What is wrong with the above code, especially the parser I created for
    > parsing the $date.


    The parser you created for parsing $date was not included in the code
    you posted so we can't possbily comment.

    [ Please excuse the line-wrap damage in the following ]

    #!/usr/bin/perl
    use strict;
    use warnings;

    $_= "12031361 ABC3 567.00
    5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5\xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\xbch)\xa4\xe9\xa5\xce12x20's";

    if ( my($prodcode,$custname,$qty,$cost,$date,$prodname) =
    /^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
    +([0-9]+\.[0-9][0-9])([0-9]+)(.*)/ ) {

    print "Result: $prodcode,$custname,$qty,$cost,$date,$prodname\n";
    } else {
    print "Failed to parse input file.\n";
    exit;
    }
    Brian McCauley, Dec 23, 2004
    #6
  7. sam

    sam Guest

    Anno Siegel wrote:

    > sam <> wrote in comp.lang.perl.misc:
    >
    >>Hi,
    >>
    >>I would like to write a perl script to parse each line read from a text
    >>file.
    >>I ended up some perl code as shown below:
    >>
    >>($prodcode,$custname,$qty,$cost,$date,$prodname) =
    >> /^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
    >>+([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,

    >
    >
    > Up to here, it looks like a regex of sorts, but what is this:
    >
    >
    >> "12031361 ABC3 567.00
    >>5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5\xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
    >>xbch)\xa4\xe9\xa5\xce12x20's";

    >
    >
    >>print "Result:
    >>".$prodcode.",".$custname.",".$qty.",".$cost.",".$date.",".$prodname . "\n";

    >
    >
    > Use string interpolation, not concatenation if there are lots of
    > variables. Better yet, collect the result in an array @data, then
    > say
    >
    > print "Result: ", join( ',', @data), "\n";
    >
    >
    >>if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
    >>$date eq "" or $prodname eq "") {
    >> print "Failed to parse input file.\n";
    >> exit;
    >>}

    >
    >
    > ...and this could be written
    >
    > print "Failed to parse input file.\n" if grep length() == 0, @data;
    >

    Thanks very much. This is very helpful indeed.

    Thanks
    Sam

    >
    >
    > Anno
    sam, Dec 23, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Lyn
    Replies:
    5
    Views:
    526
    Terry Michaels
    May 6, 2004
  2. Daniel Bass
    Replies:
    6
    Views:
    394
    Daniel Bass
    Oct 13, 2003
  3. Brian Henry

    reg expression for date validation

    Brian Henry, Dec 5, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    5,699
    Kevin Spencer
    Dec 5, 2003
  4. Pat

    reg expression

    Pat, Sep 15, 2005, in forum: ASP .Net
    Replies:
    5
    Views:
    401
    naijacoder naijacoder
    Sep 15, 2005
  5. Red Ogden

    xsd reg expression for wildcard

    Red Ogden, Jun 25, 2003, in forum: XML
    Replies:
    1
    Views:
    955
    Red Ogden
    Jun 26, 2003
Loading...

Share This Page