help with some capturing syntax

Discussion in 'Perl Misc' started by Matt Williamson, Jul 7, 2006.

  1. Given the following, is there an easy way to preface the print $status,
    "\n"; line with job started, job ended or job completion status? I've been
    reading about about capturing in the blue camel, but I can't figure out if
    or how to make it work.

    foreach my $line (@content){
    if ($line =~ /(?:job started|job ended|job completion status)/i) {
    $line =~ /:(.*)$/;
    my $status = $1;
    chomp $status;
    for ($status) {
    s/^\s+//;
    s/\s+$//;
    }
    print |insert the status that matched above| $status, "\n";
    }
    }

    TIA

    Matt
     
    Matt Williamson, Jul 7, 2006
    #1
    1. Advertisements

  2. I figured it out. If there is a more efficient way to code it though, I'm
    open.

    foreach my $line (@content){
    if ($line =~ /((job started)|(job ended)|(job completion status))/i) {
    my $label = $1;
    $line =~ /:(.*)$/;
    my $status = $1;
    chomp $status;
    for ($status) {
    s/^\s+//;
    s/\s+$//;
    }
    print $label, " : ",$status, "\n";
    }
     
    Matt Williamson, Jul 7, 2006
    #2
    1. Advertisements

  3. Matt Williamson

    Paul Lalli Guest

    Adding the ?: above specifically makes this *not* capture. If you
    wanted to capture them, why are you specifically telling perl *not* to
    capture them? Capture it, and then assign a permanent variable to $1,
    so you can later print it out:

    if ($line =~ /(job started|job ended|job completion status)/i) {
    my $job_type = $1;

    More succinctly written: chomp (my ($status) = $line =~ /:(.*)$/);
    what is the point of a loop that goes iterates only once? Are you just
    trying to avoid writing "$status" twice instead of once? Does that
    really make sense to you?

    $status =~ s/^\s+//;
    $status =~ s/\s+$//;

    Of course, you could have equally well just not captured the whitespace
    in your original match.
    If I were to write this whole code, to do what I *think* you're trying
    to accomplish, it would look something like:

    foreach my $line (@content){
    if ($line =~ /(job (?:started|ended|completion
    status)):\s*(.*?)\s*$/i) {
    my ($job_type, $status) = ($1, $2);
    print "$job_type: $status\n";
    }
    }

    Of course, without any sample input or output to go by, I'm only
    guessing.

    Paul Lalli
     
    Paul Lalli, Jul 7, 2006
    #3
  4. Matt Williamson

    Paul Lalli Guest

    What do you think those three inner parentheses are doing?

    (see my previous post for a critique of the rest of the code)

    Paul Lalli
     
    Paul Lalli, Jul 7, 2006
    #4
  5. It's in perlfaq 4 that way. I'm quite new to this, so I can't really say
    what does or doesn't make sense. <g>


    This is much cleaner code and you've taught me a good bit by posting it.
    Thanks!
     
    Matt Williamson, Jul 7, 2006
    #5
  6. Matt Williamson

    Paul Lalli Guest

    Paul Lalli, Jul 7, 2006
    #6
  7. This is what it says in my version of Perlfaq 4

    ___

    How do I strip blank space from the beginning/end of a string?
    Although the simplest approach would seem to be

    $string =~ s/^\s*(.*?)\s*$/$1/;

    not only is this unnecessarily slow and destructive, it also fails with
    embedded newlines. It is much faster to do this operation in two steps:

    $string =~ s/^\s+//;
    $string =~ s/\s+$//;

    Or more nicely written as:

    for ($string) {
    s/^\s+//;
    s/\s+$//;
    }

    This idiom takes advantage of the "foreach" loop's aliasing behavior to
    factor out common code. You can do this on several strings at once, or
    arrays, or even the values of a hash if you use a slice:

    # trim whitespace in the scalar, the array,
    # and all the values in the hash
    foreach ($scalar, @array, @hash{keys %hash}) {
    s/^\s+//;
    s/\s+$//;
    }

    ___

    Since it said "Or more nicely written as" I thought that must be the correct
    way to do it. It now seems that it's only better if you have multiple
    strings to use it on.

    Matt
     
    Matt Williamson, Jul 7, 2006
    #7
  8. Matt Williamson

    Paul Lalli Guest

    Yeesh. I certainly am glad that FAQ's been updated. :)
    One of the things you should learn about Perl is that there's no such
    thing as "the" correct way to do anything. Indeed, one of Perl's
    mottos is "There Is More Than One Way To Do It". Obviously, someone
    thought (and probably still thinks) that aliasing the variable to $_ by
    means of a one-iteration foreach was a good way of doing it. I
    disagree, as it seems to be needlessly misleading. That doesn't mean
    that either my way or that old FAQ's way are "wrong".
    That is my opinion, yes. There is, of course, something to be said for
    extensability. With the FAQ's way, your code is all set to have more
    variables added to it, just by typing them into the foreach's list.
    "My" way, you'd have to copy and paste code. You need to decide which
    is the worst trade-off: readability, or extensability.

    Paul Lalli
     
    Paul Lalli, Jul 7, 2006
    #8
  9. You are using non-capturing parentheses. If you want to capture the job
    status you have to use capturing parentheses:

    if ($line =~ /(job (?:started|ended|completion status))/i) {
    my $job_status = $1;
    You should only use the numerical variables after a successful match:

    my ( $status ) = $line =~ /:(.*)$/;

    /:(.*)$/ will not match a newline so unless you have changed the value of $/
    there is nothing for chomp to remove and in any case the s/\s+$//; later would
    remove any trailing newlines.



    John
     
    John W. Krahn, Jul 7, 2006
    #9

  10. You have the same cardinal sin that the original had though.


    You should never use the dollar-digit variables unless you
    have first ensured that the match *succeeded*.

    If you ever get a $line with no colons, then $status will NOT have
    the status in it, it will have the same value as $label because
    $1 was set way back when _that_ match succeeded.


    die "no colon in '$line'" unless $line =~ /:(.*)$/;
    my $status = $1; # now it's safe to use $1


    This is a common mistake.

    I made the very same one here in 1995:

    Message-ID: <>

    Now I'm just returning the favor. :)
     
    Tad McClellan, Jul 7, 2006
    #10

  11. Even more succinctly written: my($status) = $line =~ /:(.*)$/;

    .... since there cannot be any newlines in $status anyway. :)
     
    Tad McClellan, Jul 7, 2006
    #11
  12. Matt Williamson

    Xicheng Jia Guest

    no need to guess if you change the above *if* statement to the
    following:

    if ($line =~ /(?=.*?(job (?:started|ended|completion
    status))).*?:\s*(.*?)\s*$/i)

    Xicheng
     
    Xicheng Jia, Jul 8, 2006
    #12
  13. Matt Williamson

    Dr.Ruud Guest

    Paul Lalli schreef:
    Ik like it as

    s/^\s+//, s/\s+$// for $status;

    but I don't remember ever having checked that idiom for having
    optimization benefits or performance penalties.
     
    Dr.Ruud, Jul 9, 2006
    #13
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.