Can this be combined into one statement?

Discussion in 'Perl Misc' started by John Black, Oct 28, 2013.

  1. John Black

    John Black Guest

    Simple question I think. I have a string $line that has some number of fields separated by
    one or more spaces. The filename is the last field on the line and I want to grab it. There
    must be a way to write these two lines as one line (skipping the intermediate @line_arr
    step):

    @line_arr = split(/\s+/, $line);
    $file = $line_arr[-1];

    but I've tried various syntaxes like:

    $file = split(/\s+/, $line)[-1];

    but I have not hit upon a working syntax. What is it? Thanks.

    John Black
     
    John Black, Oct 28, 2013
    #1
    1. Advertising

  2. John Black <> wrote:
    >Simple question I think. I have a string $line that has some number of fields separated by
    >one or more spaces. The filename is the last field on the line and I want to grab it. There
    >must be a way to write these two lines as one line (skipping the intermediate @line_arr
    >step):
    >
    > @line_arr = split(/\s+/, $line);
    > $file = $line_arr[-1];
    >
    >but I've tried various syntaxes like:
    >
    > $file = split(/\s+/, $line)[-1];


    You almost got it:
    $file = (split(/\s+/, $line))[-1];

    jue
     
    Jürgen Exner, Oct 28, 2013
    #2
    1. Advertising

  3. John Black

    Dr.Ruud Guest

    On 2013-10-28 23:03, John Black wrote:

    > Simple question I think. I have a string $line that has some number of fields separated by
    > one or more spaces. The filename is the last field on the line and I want to grab it. There
    > must be a way to write these two lines as one line (skipping the intermediate @line_arr
    > step):
    >
    > @line_arr = split(/\s+/, $line);
    > $file = $line_arr[-1];
    >
    > but I've tried various syntaxes like:
    >
    > $file = split(/\s+/, $line)[-1];
    >
    > but I have not hit upon a working syntax. What is it? Thanks.


    There are several ways to get there, examples:

    $file = ( split " ", $line )[-1];

    ( $file ) = $line =~ /.*(\S+)/;


    --
    Ruud
     
    Dr.Ruud, Oct 28, 2013
    #3
  4. John Black

    Dr.Ruud Guest

    On 2013-10-28 23:03, John Black wrote:

    > I have a string $line that has some number of fields separated by
    > one or more spaces. The filename is the last field on the line
    > [...] I've tried various syntaxes like:
    >
    > $file = split(/\s+/, $line)[-1];
    >
    > but I have not hit upon a working syntax. What is it? Thanks.


    There are several ways to get there, examples:

    $file = ( split " ", $line )[ -1 ];

    ( $file ) = $line =~ /.* (\S+) /x;


    Also see Text::CSV_XS.

    --
    Ruud
     
    Dr.Ruud, Oct 29, 2013
    #4
  5. John Black

    Dr.Ruud Guest

    On 2013-10-29 00:56, Eli the Bearded wrote:
    > In comp.lang.perl.misc, Dr.Ruud <> wrote:


    >> ( $file ) = $line =~ /.*(\S+)/;

    >
    > $ perl -wle '$line = "a\tb c\td e\tfilename";
    > ($file) = $line =~ /.*(\S+)/;
    > print $file'
    > e
    >
    > Did you actually try your examples?


    Yeah, but only badly.

    ($file) = $line =~ /.*\s(\S+)/;

    --
    Ruud
     
    Dr.Ruud, Oct 29, 2013
    #5
  6. John Black

    Dr.Ruud Guest

    On 2013-10-29 01:01, Dr.Ruud wrote:
    > On 2013-10-28 23:03, John Black wrote:


    > ( $file ) = $line =~ /.* (\S+) /x;


    Correction:

    ( $file ) = $line =~ /.*\s (\S+) /x;

    --
    Ruud
     
    Dr.Ruud, Oct 29, 2013
    #6
  7. John Black

    Dr.Ruud Guest

    On 2013-10-29 00:56, Eli the Bearded wrote:
    > In comp.lang.perl.misc, Dr.Ruud <> wrote:


    >> $file = ( split " ", $line )[-1];

    >
    > $ perl -wle '$line = "a\tb c\td e\tfilename";
    > $file = ( split " ", $line )[-1];
    > print $file'
    > filename


    That works as meant, see "perldoc -f split" about the specialness of a
    single space as the first parameter of split.


    But to take the original post literally ("spaces"), it should capture
    "e\tfilename", so then the split should be done with / +/.

    --
    Ruud
     
    Dr.Ruud, Oct 29, 2013
    #7
  8. John Black <> writes:
    > Simple question I think. I have a string $line that has some number of fields separated by
    > one or more spaces. The filename is the last field on the line and I want to grab it. There
    > must be a way to write these two lines as one line (skipping the intermediate @line_arr
    > step):
    >
    > @line_arr = split(/\s+/, $line);
    > $file = $line_arr[-1];
    >
    > but I've tried various syntaxes like:
    >
    > $file = split(/\s+/, $line)[-1];
    >
    > but I have not hit upon a working syntax.


    Grab a sequence of non-whitespace characters anchored at the end of the
    string?

    $line =~ /(\S+)$/
     
    Rainer Weikusat, Oct 29, 2013
    #8
  9. John Black <> writes:
    > Simple question I think. I have a string $line that has some number of fields separated by
    > one or more spaces. The filename is the last field on the line and I want to grab it. There
    > must be a way to write these two lines as one line (skipping the intermediate @line_arr
    > step):
    >
    > @line_arr = split(/\s+/, $line); [*]
    > $file = $line_arr[-1];
    >
    > but I've tried various syntaxes like:
    >
    > $file = split(/\s+/, $line)[-1]; [**]
    >
    > but I have not hit upon a working syntax.


    Since nobody wrote this so far: The first split call ([*]) runs split in
    list context, hence, it returns a list of strings created by it. But the
    second ([**]) runs it in scalar context and then, it splits into @_ and
    returns the number of fields found in the input.

    $file = (split(' ', $line))[-1]

    works as intended because the split is evaluated inside a list because
    of the outer brackets.
     
    Rainer Weikusat, Oct 29, 2013
    #9
  10. John Black

    John Black Guest

    In article <>,
    says...
    >
    > John Black <> writes:
    > > Simple question I think. I have a string $line that has some number of fields separated by
    > > one or more spaces. The filename is the last field on the line and I want to grab it. There
    > > must be a way to write these two lines as one line (skipping the intermediate @line_arr
    > > step):
    > >
    > > @line_arr = split(/\s+/, $line);
    > > $file = $line_arr[-1];
    > >
    > > but I've tried various syntaxes like:
    > >
    > > $file = split(/\s+/, $line)[-1];
    > >
    > > but I have not hit upon a working syntax.

    >
    > Grab a sequence of non-whitespace characters anchored at the end of the
    > string?
    >
    > $line =~ /(\S+)$/


    ooo, nice. However, if there happens to be any whitespace between the last field and the end
    of the line, I don't think this will work. But I think the split method would still be ok.
    I don't know if there ever will be any spaces after the filename but probably better to use
    code that would handle it.

    John Black
     
    John Black, Oct 29, 2013
    #10
  11. John Black

    John Black Guest

    In article <>, says...
    >
    > John Black <> wrote:
    > >Simple question I think. I have a string $line that has some number of fields separated by
    > >one or more spaces. The filename is the last field on the line and I want to grab it. There
    > >must be a way to write these two lines as one line (skipping the intermediate @line_arr
    > >step):
    > >
    > > @line_arr = split(/\s+/, $line);
    > > $file = $line_arr[-1];
    > >
    > >but I've tried various syntaxes like:
    > >
    > > $file = split(/\s+/, $line)[-1];

    >
    > You almost got it:
    > $file = (split(/\s+/, $line))[-1];
    >
    > jue


    Thanks all. This works. The answer was kind of obvious but I thought I tried that. Maybe I
    put the first open paren after the split?

    John Black
     
    John Black, Oct 29, 2013
    #11
  12. On 2013-10-29 14:10, Rainer Weikusat <> wrote:
    > John Black <> writes:
    >> @line_arr = split(/\s+/, $line); [*]
    >> $file = $line_arr[-1];
    >>
    >> but I've tried various syntaxes like:
    >>
    >> $file = split(/\s+/, $line)[-1]; [**]
    >>
    >> but I have not hit upon a working syntax.

    >
    > Since nobody wrote this so far: The first split call ([*]) runs split in
    > list context, hence, it returns a list of strings created by it. But the
    > second ([**]) runs it in scalar context


    On my systems (perl 5.8.0 to 5.14.2) the second call doesn't run at all.
    It's a syntax error.

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
     
    Peter J. Holzer, Oct 29, 2013
    #12
  13. John Black <> writes:
    > In article <>,
    > says...
    >>
    >> John Black <> writes:
    >> > Simple question I think. I have a string $line that has some number of fields separated by
    >> > one or more spaces. The filename is the last field on the line and I want to grab it. There
    >> > must be a way to write these two lines as one line (skipping the intermediate @line_arr
    >> > step):
    >> >
    >> > @line_arr = split(/\s+/, $line);
    >> > $file = $line_arr[-1];
    >> >
    >> > but I've tried various syntaxes like:
    >> >
    >> > $file = split(/\s+/, $line)[-1];
    >> >
    >> > but I have not hit upon a working syntax.

    >>
    >> Grab a sequence of non-whitespace characters anchored at the end of the
    >> string?
    >>
    >> $line =~ /(\S+)$/

    >
    > ooo, nice. However, if there happens to be any whitespace between the last field and the end
    > of the line, I don't think this will work.


    $line =~ /(\S+)\s*$/
     
    Rainer Weikusat, Oct 29, 2013
    #13
  14. John Black

    John Black Guest

    In article <>,
    says...
    >
    > John Black <> writes:
    > > In article <>,
    > > says...
    > >>
    > >> John Black <> writes:
    > >> > Simple question I think. I have a string $line that has some number of fields separated by
    > >> > one or more spaces. The filename is the last field on the line and I want to grab it. There
    > >> > must be a way to write these two lines as one line (skipping the intermediate @line_arr
    > >> > step):
    > >> >
    > >> > @line_arr = split(/\s+/, $line);
    > >> > $file = $line_arr[-1];
    > >> >
    > >> > but I've tried various syntaxes like:
    > >> >
    > >> > $file = split(/\s+/, $line)[-1];
    > >> >
    > >> > but I have not hit upon a working syntax.
    > >>
    > >> Grab a sequence of non-whitespace characters anchored at the end of the
    > >> string?
    > >>
    > >> $line =~ /(\S+)$/

    > >
    > > ooo, nice. However, if there happens to be any whitespace between the last field and the end
    > > of the line, I don't think this will work.

    >
    > $line =~ /(\S+)\s*$/


    Yep, I thought of this after posting. Thanks. I like this. I bet its faster than using
    split which ends up extracting a bunch of fields that are never used here.

    John Black

    John Black
     
    John Black, Oct 29, 2013
    #14
  15. On 2013-10-29 18:36, John Black <> wrote:
    > In article <>,
    > says...
    >> John Black <> writes:
    >> > In article <>,
    >> > says...
    >> >> John Black <> writes:
    >> >> > @line_arr = split(/\s+/, $line);
    >> >> > $file = $line_arr[-1];
    >> >> >
    >> >> > but I've tried various syntaxes like:
    >> >> >
    >> >> > $file = split(/\s+/, $line)[-1];

    [...]
    >>
    >> $line =~ /(\S+)\s*$/

    >
    > Yep, I thought of this after posting. Thanks. I like this. I bet its faster than using
    > split which ends up extracting a bunch of fields that are never used here.


    OTOH the regexp probably needs to do a lot of backtracking, so you might
    lose that bet.

    Let's see:


    #!/usr/bin/perl
    use warnings;
    use strict;

    use Benchmark ':all';

    my @lines;

    for (1 .. 1000) {
    my $line = "";
    my $nwords = rand(10) + 1;
    for my $iw (1 .. $nwords) {
    $line .= "a" x (rand(10) + 1);
    $line .= " " x (rand(3) + ($iw < $nwords));
    }
    push @lines, $line;
    }

    cmpthese(-5,
    {
    'split' => sub {
    for my $line (@lines) {
    my $file = (split(/\s+/, $line))[-1];
    }
    },
    'match' => sub {
    for my $line (@lines) {
    my ($file) = $line =~ /(\S+)\s*$/;
    }
    }
    }
    );
    __END__

    Rate match split
    match 208/s -- -67%
    split 625/s 200% --


    Yup, split is about 3 times faster for this particular set of strings
    (may be wildly different for other strings).

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
     
    Peter J. Holzer, Oct 29, 2013
    #15
  16. John Black

    John Black Guest

    In article <>, says...
    >
    > On 2013-10-29 18:36, John Black <> wrote:
    > > In article <>,
    > > says...
    > >> John Black <> writes:
    > >> > In article <>,
    > >> > says...
    > >> >> John Black <> writes:
    > >> >> > @line_arr = split(/\s+/, $line);
    > >> >> > $file = $line_arr[-1];
    > >> >> >
    > >> >> > but I've tried various syntaxes like:
    > >> >> >
    > >> >> > $file = split(/\s+/, $line)[-1];

    > [...]
    > >>
    > >> $line =~ /(\S+)\s*$/

    > >
    > > Yep, I thought of this after posting. Thanks. I like this. I bet its faster than using
    > > split which ends up extracting a bunch of fields that are never used here.

    >
    > OTOH the regexp probably needs to do a lot of backtracking, so you might
    > lose that bet.
    >
    > Let's see:
    >
    >
    > #!/usr/bin/perl
    > use warnings;
    > use strict;
    >
    > use Benchmark ':all';
    >
    > my @lines;
    >
    > for (1 .. 1000) {
    > my $line = "";
    > my $nwords = rand(10) + 1;
    > for my $iw (1 .. $nwords) {
    > $line .= "a" x (rand(10) + 1);
    > $line .= " " x (rand(3) + ($iw < $nwords));
    > }
    > push @lines, $line;
    > }
    >
    > cmpthese(-5,
    > {
    > 'split' => sub {
    > for my $line (@lines) {
    > my $file = (split(/\s+/, $line))[-1];
    > }
    > },
    > 'match' => sub {
    > for my $line (@lines) {
    > my ($file) = $line =~ /(\S+)\s*$/;
    > }
    > }
    > }
    > );
    > __END__
    >
    > Rate match split
    > match 208/s -- -67%
    > split 625/s 200% --
    >
    >
    > Yup, split is about 3 times faster for this particular set of strings
    > (may be wildly different for other strings).
    >
    > hp


    This is one of the things I love about math and computers. You can prove your case. I stand
    corrected. My laptop got:

    Rate match split
    match 261/s -- -39%
    split 427/s 64% --

    BTW, what is the -5 option doing in the cmpthese function? I thought the first param was the
    number of iterations, but then negative doesn't make sense?

    John Black
     
    John Black, Oct 29, 2013
    #16
  17. Ben Morrow <> writes:
    > Quoth "Peter J. Holzer" <>:
    >> On 2013-10-29 18:36, John Black <> wrote:
    >> > In article <>,

    >>
    >> >
    >> > Yep, I thought of this after posting. Thanks. I like this. I bet

    >> its faster than using
    >> > split which ends up extracting a bunch of fields that are never used here.

    >>
    >> OTOH the regexp probably needs to do a lot of backtracking, so you might
    >> lose that bet.
    >>
    >> Let's see:

    > [...]
    >> Rate match split
    >> match 208/s -- -67%
    >> split 625/s 200% --
    >>
    >>
    >> Yup, split is about 3 times faster for this particular set of strings
    >> (may be wildly different for other strings).

    >
    > Interestingly, perl is much better at optimising /.*\s(\S+)/ (it only
    > has to backtrack over the last word, instead of the whole string), so
    > that comes out faster again:
    >
    > Rate \S+\s*$ split .*\s\S+
    > \S+\s*$ 274/s -- -66% -66%
    > split 794/s 190% -- -2%
    > .*\s\S+ 812/s 197% 2% --
    >
    > Not much, though.


    I tried this as well: The more words are on such a line, the better the
    "Don't backtrack!" match becomes.
     
    Rainer Weikusat, Oct 29, 2013
    #17
  18. John Black

    John Black Guest

    In article <>,
    says...
    >
    > Ben Morrow <> writes:
    > > Quoth "Peter J. Holzer" <>:
    > >> On 2013-10-29 18:36, John Black <> wrote:
    > >> > In article <>,
    > >>
    > >> >
    > >> > Yep, I thought of this after posting. Thanks. I like this. I bet
    > >> its faster than using
    > >> > split which ends up extracting a bunch of fields that are never used here.
    > >>
    > >> OTOH the regexp probably needs to do a lot of backtracking, so you might
    > >> lose that bet.
    > >>
    > >> Let's see:

    > > [...]
    > >> Rate match split
    > >> match 208/s -- -67%
    > >> split 625/s 200% --
    > >>
    > >>
    > >> Yup, split is about 3 times faster for this particular set of strings
    > >> (may be wildly different for other strings).

    > >
    > > Interestingly, perl is much better at optimising /.*\s(\S+)/ (it only
    > > has to backtrack over the last word, instead of the whole string), so
    > > that comes out faster again:
    > >
    > > Rate \S+\s*$ split .*\s\S+
    > > \S+\s*$ 274/s -- -66% -66%
    > > split 794/s 190% -- -2%
    > > .*\s\S+ 812/s 197% 2% --
    > >
    > > Not much, though.

    >
    > I tried this as well: The more words are on such a line, the better the
    > "Don't backtrack!" match becomes.


    Why does /(\S+)\s*$/ have to backtrack over "the whole string" whereas /.*\s(\S+)/ does not?
    I'm sure I don't undertand regex backtracking...

    John Black
     
    John Black, Oct 30, 2013
    #18
  19. John Black

    John Black Guest

    In article <>, says...
    >
    > Quoth John Black <>:
    > >
    > > Why does /(\S+)\s*$/ have to backtrack over "the whole string" whereas
    > > /.*\s(\S+)/ does not?
    > > I'm sure I don't undertand regex backtracking...

    >
    > Consider a string like "foo bar baz ". For /\S+\s*$/ perl tries the
    > following sequence of matches:
    >
    > \S+ \s* $
    > "foo" " " no match, backtrack
    > "fo" "" no match, backtrack
    > "f" "" no match, backtrack
    >
    > Now perl has tried all the matches starting at the beginning of the
    > string, so it has to move along the string and try again. It skips over
    > characters matching \S, since it's already tried all possible end-points
    > for \S+ in this word, then it skips over characters not matching \S,
    > since they can't possibly match, and starts again with:
    >
    > "bar" " " no match, backtrack
    > "ba" "" no match, backtrack
    > "b" "" no match, backtrack
    >
    > And again:
    >
    > "baz" " " match
    >
    > With more words in the string, or longer words, this would take more
    > attempts. OTOH, with /.*\s\S+/ it tries these matches:
    >
    > .* \s \S+
    > "foo bar baz " no match, backtrack
    > "foo bar baz" " " no match, backtrack
    > "foo bar ba" no match, backtrack
    > "foo bar b" no match, backtrack
    > "foo bar " no match, backtrack
    > "foo bar" " " "baz"
    >
    > which only ever has to backtrack over the last word. In the specific
    > case of a very long last word preceded by a small number of short words
    > it would come out slower than the first match, but otherwise it comes
    > out faster.
    >
    > You can see what perl is doing by running something like
    >
    > perl -Mre=debug -e'"foo bar baz " =~ /.*\s\S+/'
    >
    > though it takes a bit of practice to get used to interpreting the
    > output.
    >
    > Ben


    Ben, thanks for the detailed explanation. This is good stuff to keep in mind when in a
    performance critical loop, but if I were doing this again, I would still go with /(\S+)\s*$/
    because it is (to me) much more clear about what its doing. The $ anchor makes it obvious
    that we are grabbing the word at the end of the line. The other regex matches every word on
    the line and then you have to deduce that of those, the word it will return is the last one
    becasue of Perl's default greedy matching policy. Which makes it less intuitive and
    readable.

    John Black
     
    John Black, Oct 31, 2013
    #19
  20. On 10/30/2013 11:29 AM, Ben Morrow wrote:
    >
    > Quoth John Black <>:
    >>
    >> Why does /(\S+)\s*$/ have to backtrack over "the whole string" whereas
    >> /.*\s(\S+)/ does not?
    >> I'm sure I don't undertand regex backtracking...

    >
    > Consider a string like "foo bar baz ". For /\S+\s*$/ perl tries the
    > following sequence of matches:
    >
    > \S+ \s* $
    > "foo" " " no match, backtrack
    > "fo" "" no match, backtrack
    > "f" "" no match, backtrack
    >
    > Now perl has tried all the matches starting at the beginning of the
    > string, so it has to move along the string and try again. It skips over
    > characters matching \S, since it's already tried all possible end-points
    > for \S+ in this word, then it skips over characters not matching \S,
    > since they can't possibly match, and starts again with:
    >
    > "bar" " " no match, backtrack
    > "ba" "" no match, backtrack
    > "b" "" no match, backtrack
    >
    > And again:
    >
    > "baz" " " match
    >
    > With more words in the string, or longer words, this would take more
    > attempts.
    > ...


    I thought a possessive quantifier would help with this more intuitive
    alternative: (\S+)\s*$ -> (\S++)\s*+$. But, unless there's some basic
    error on my part, then the possessive replacement ate the proverbial
    dust even of the backtracking regex. Maybe there are still caching
    issues as mentioned here:

    http://www.perlmonks.org/bare/?node_id=664545

    --
    Charles DeRykus
     
    Charles DeRykus, Oct 31, 2013
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mike Kingscott
    Replies:
    0
    Views:
    586
    Mike Kingscott
    Dec 5, 2003
  2. Replies:
    1
    Views:
    411
    =?Utf-8?B?Q293Ym95IChHcmVnb3J5IEEuIEJlYW1lcikgLSBN
    Jul 18, 2005
  3. Peter J. Seymour

    Combined repaints?

    Peter J. Seymour, Jul 11, 2004, in forum: Java
    Replies:
    6
    Views:
    441
    Peter J. Seymour
    Jul 12, 2004
  4. Oxnard
    Replies:
    4
    Views:
    365
    Joona I Palaste
    Oct 26, 2004
  5. JB
    Replies:
    1
    Views:
    953
    hssig
    Jul 7, 2011
Loading...

Share This Page