Newbie needs help with split() and "<"

Discussion in 'Perl Misc' started by Bill, Jul 15, 2004.

  1. Bill

    Bill Guest

    I am trying to split the following line into a list of just the
    numbers. It is a list of xy coordinates.

    <-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>

    I can use split() with comma, and ">", but not "<". The following
    code works, but I can not add "<" to the regular expression used by
    split(). I have tried various combinations of "\<" with and without
    quotes without success. Any ideas?
    Thanks.


    $tmpline = "<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>";
    (@grphdata) = split(/[\,>]/,$tmpline);
    print $tmpline . "\n";
    $i2 = 0;
    while ($grphdata[$i2]){
    print $i2 . " " . $grphdata[$i2] . "\n";
    $i2++;
    }
     
    Bill, Jul 15, 2004
    #1
    1. Advertising

  2. Bill wrote:
    > I am trying to split the following line into a list of just the
    > numbers. It is a list of xy coordinates.
    >
    > <-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>
    >
    > I can use split() with comma, and ">", but not "<". The following
    > code works, but I can not add "<" to the regular expression used by
    > split(). I have tried various combinations of "\<" with and
    > without quotes without success. Any ideas?


    Since it's easier to tell what it is you want than what it is you do
    not want, you'd better use the m// operator instead of split().

    push @grphdata, $1 while $tmpline =~ /(-?\d+)/g;

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jul 15, 2004
    #2
    1. Advertising

  3. Bill

    gmax Guest

    Bill wrote:
    > I am trying to split the following line into a list of just the
    > numbers. It is a list of xy coordinates.
    >
    > <-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>
    >
    > I can use split() with comma, and ">", but not "<". The following
    > code works, but I can not add "<" to the regular expression used by
    > split(). I have tried various combinations of "\<" with and without
    > quotes without success. Any ideas?
    > Thanks.
    >
    >
    > $tmpline = "<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>";
    > (@grphdata) = split(/[\,>]/,$tmpline);
    > print $tmpline . "\n";
    > $i2 = 0;
    > while ($grphdata[$i2]){
    > print $i2 . " " . $grphdata[$i2] . "\n";
    > $i2++;
    > }


    split *is* working. It's the test in your while loop that is faulty :)
    If you use "<" as a separator, the first item will be an empty string,
    i.e. the empty string before the initial "<".
    See

    perldoc -f split


    #!/usr/bin/perl -w
    use strict;
    use Data::Dumper;

    my $tmpline = "<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>";
    my @grphdata = split(/[,>< ]+/,$tmpline);
    print $tmpline . "\n";

    print Dumper \@grphdata;

    my $i2 = 0;
    while ($grphdata[$i2]){
    print $i2 . " " . $grphdata[$i2] . "\n";
    $i2++;
    }

    __END__
    output:
    $VAR1 = [
    '', # this will evaluate as FALSE
    '-250',
    '-850',
    '-250',
    '800',
    '200',
    '800',
    '200',
    '-850',
    '-250',
    '-850',
    ''
    ];


    If you want to avoid the empty items, use
    @grphdata = grep { $_ } split(/[,>< ]+/,$tmpline);


    HTH

    gmax


    --
    ____ ____ _____ _ _
    / _ | \(____ ( \ / )
    ( (_| | | | / ___ |) X (
    \___ |_|_|_\_____(_/ \_)
    (_____|
    Sapere, saper fare, fare, far sapere
    http://gmax.oltrelinux.com
     
    gmax, Jul 15, 2004
    #3
  4. Bill

    Paul Lalli Guest

    On Thu, 15 Jul 2004, Bill wrote:

    > I am trying to split the following line into a list of just the
    > numbers. It is a list of xy coordinates.
    >
    > <-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>
    >
    > I can use split() with comma, and ">", but not "<". The following
    > code works, but I can not add "<" to the regular expression used by
    > split(). I have tried various combinations of "\<" with and without
    > quotes without success. Any ideas?
    > Thanks.


    Here's some very basic advice. Use split() when you know exactly what you
    want to throw away. Use m// when you know exactly what you want to keep.
    In this case, it's far easier to define what you want to keep:

    > $tmpline = "<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>";
    > (@grphdata) = split(/[\,>]/,$tmpline);


    @grphdata = m/(-?\d+)/g;


    > print $tmpline . "\n";
    > $i2 = 0;
    > while ($grphdata[$i2]){
    > print $i2 . " " . $grphdata[$i2] . "\n";
    > $i2++;
    > }


    That messy while loop can be better written either of these ways:

    print join(' ', @grphdata), "\n";

    {
    local $" = ' '; #usually not necessary, as ' ' is the default.
    print "@grphdata\n";
    }


    Paul Lalli
     
    Paul Lalli, Jul 15, 2004
    #4
  5. (Bill) wrote in news:7dbe2fe9.0407151125.33e89ac3
    @posting.google.com:

    > I am trying to split the following line into a list of just the
    > numbers. It is a list of xy coordinates.
    >
    > <-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>
    >
    > I can use split() with comma, and ">", but not "<". The following
    > code works, but I can not add "<" to the regular expression used by
    > split(). I have tried various combinations of "\<" with and without
    > quotes without success. Any ideas?


    Splitting on the comman is the wrong thing to do here. What you need here
    is to extract the bracketed coordinates. The right module for this purpose
    is Text::Balanced. (Incidentally, try the following Google search and see
    what comes up: http://www.google.com/search?q=perl extract bracketed).

    > $tmpline = "<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>";
    > (@grphdata) = split(/[\,>]/,$tmpline);
    > print $tmpline . "\n";
    > $i2 = 0;
    > while ($grphdata[$i2]){
    > print $i2 . " " . $grphdata[$i2] . "\n";
    > $i2++;
    > }


    Ugly :(

    #! perl

    use strict;
    use warnings;

    use Text::Balanced qw(extract_bracketed);

    my @points;

    while(my $line = <DATA>) {
    while(my $next = extract_bracketed $line, '<>') {
    if($next =~ /^<(-?\d+),(-?\d+)>$/) {
    push @points, { x => $1, y => $2 };
    }
    }
    }

    use Data::Dumper;
    print Dumper \@points;

    __DATA__
    <-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>
    <-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>

    C:\Home> test
    $VAR1 = [
    {
    'y' => '-850',
    'x' => '-250'
    },
    {
    'y' => '800',
    'x' => '-250'
    },
    {
    'y' => '800',
    'x' => '200'
    },
    {
    'y' => '-850',
    'x' => '200'
    },
    {
    'y' => '-850',
    'x' => '-250'
    },
    {
    'y' => '-850',
    'x' => '-250'
    },
    {
    'y' => '800',
    'x' => '-250'
    },
    {
    'y' => '800',
    'x' => '200'
    },
    {
    'y' => '-850',
    'x' => '200'
    },
    {
    'y' => '-850',
    'x' => '-250'
    },
    ];



    --
    A. Sinan Unur
    d
    (remove '.invalid' and reverse each component for email address)
     
    A. Sinan Unur, Jul 15, 2004
    #5
  6. Gunnar Hjalmarsson wrote:
    >
    > Bill wrote:
    > > I am trying to split the following line into a list of just the
    > > numbers. It is a list of xy coordinates.
    > >
    > > <-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>
    > >
    > > I can use split() with comma, and ">", but not "<". The following
    > > code works, but I can not add "<" to the regular expression used by
    > > split(). I have tried various combinations of "\<" with and
    > > without quotes without success. Any ideas?

    >
    > Since it's easier to tell what it is you want than what it is you do
    > not want, you'd better use the m// operator instead of split().
    >
    > push @grphdata, $1 while $tmpline =~ /(-?\d+)/g;


    Or even simply:

    my @grphdata = $tmpline =~ /-?\d+/g;


    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Jul 15, 2004
    #6
  7. Bill

    Paul Lalli Guest

    On Thu, 15 Jul 2004, Paul Lalli wrote:

    > On Thu, 15 Jul 2004, Bill wrote:
    >
    > > $tmpline = "<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>";
    > > (@grphdata) = split(/[\,>]/,$tmpline);

    >
    > @grphdata = m/(-?\d+)/g;


    Whoops. This assumes the data strings is in $_. If it's not, you need to
    type a little bit more:

    @grphdata = $tmpline =~ m/(-?\d+)/g;

    Paul Lally
     
    Paul Lalli, Jul 15, 2004
    #7
  8. (Bill) writes:

    > I am trying to split the following line into a list of just the
    > numbers. It is a list of xy coordinates.
    >
    > <-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>
    >
    > I can use split() with comma, and ">", but not "<". The following
    > code works, but I can not add "<" to the regular expression used by
    > split(). I have tried various combinations of "\<" with and without
    > quotes without success. Any ideas?
    > Thanks.
    >
    >
    > $tmpline = "<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>";
    > (@grphdata) = split(/[\,>]/,$tmpline);


    This does what I think you mean:

    (@grphdata) = split(/[\,>< ]+/,$tmpline);

    Without a space in the character class and without a + afterwards,
    you'd be getting an extra element consisting of a single space between
    each of the pairs. The space includes the space character, and the +
    gobbles up the whole thing.

    -----ScottG.
     
    Scott W Gifford, Jul 15, 2004
    #8
  9. Bill

    Bill Guest

    (Bill) wrote in message news:<>...
    > I am trying to split the following line into a list of just the
    > numbers. It is a list of xy coordinates.
    >
    > <-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>
    >
    > I can use split() with comma, and ">", but not "<". The following
    > code works, but I can not add "<" to the regular expression used by
    > split(). I have tried various combinations of "\<" with and without
    > quotes without success. Any ideas?
    > Thanks.
    >
    >
    > $tmpline = "<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>";
    > (@grphdata) = split(/[\,>]/,$tmpline);
    > print $tmpline . "\n";
    > $i2 = 0;
    > while ($grphdata[$i2]){
    > print $i2 . " " . $grphdata[$i2] . "\n";
    > $i2++;
    > }


    wow… great replies everyone. I have learned a lot just now. Thanks.
    I thought the problem was with the "<", and trying to search using
    "<", was driving me nuts.

    Sorry about the messy code, just pieces I cut out of the script to
    test the split function, I will clean it up, I promise :)
     
    Bill, Jul 16, 2004
    #9
  10. Bill

    Juha Laiho Guest

    "A. Sinan Unur" <> said:
    > (Bill) wrote in news::
    >
    >> I am trying to split the following line into a list of just the
    >> numbers. It is a list of xy coordinates.
    >>
    >> <-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>

    ....
    >Splitting on the comman is the wrong thing to do here. What you need here
    >is to extract the bracketed coordinates. The right module for this purpose
    >is Text::Balanced. (Incidentally, try the following Google search and see
    >what comes up: http://www.google.com/search?q=perl extract bracketed).

    ....
    >#! perl
    >
    >use strict;
    >use warnings;
    >
    >use Text::Balanced qw(extract_bracketed);
    >
    >my @points;
    >
    >while(my $line = <DATA>) {
    > while(my $next = extract_bracketed $line, '<>') {
    > if($next =~ /^<(-?\d+),(-?\d+)>$/) {
    > push @points, { x => $1, y => $2 };
    > }
    > }
    >}

    ....

    Interesting. I was for a while myself trying to get this to work with just

    foreach ($line =~ m/<(-?\d+),(-?\d+)>/g) {
    print "$1 $2\n";
    }

    .... but for some reaosn that just printed the first pair of numbers over
    and over (the correct total amount, though). Do you have any idea why this
    would be so? Still in the above $_ is updated correctly for each match,
    but $1 and $2 stay set to the first pair of numbers. I ended up with

    foreach ($line =~ m/<.*?>/g) {
    m/<(-?\d+),(-?\d+)>/;
    print "$1 $2\n";
    }

    where $1 and $2 get updated as I expected.
    --
    Wolf a.k.a. Juha Laiho Espoo, Finland
    (GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
    PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
    "...cancel my subscription to the resurrection!" (Jim Morrison)
     
    Juha Laiho, Jul 16, 2004
    #10
  11. Juha Laiho wrote:
    > I was for a while myself trying to get this to work with just
    >
    > foreach ($line =~ m/<(-?\d+),(-?\d+)>/g) {
    > print "$1 $2\n";
    > }
    >
    > ... but for some reaosn that just printed the first pair of numbers
    > over and over (the correct total amount, though). Do you have any
    > idea why this would be so?


    Try "while" instead of "foreach".

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jul 16, 2004
    #11
  12. Bill

    Juha Laiho Guest

    Gunnar Hjalmarsson <> said:
    >Juha Laiho wrote:
    >> I was for a while myself trying to get this to work with just
    >>
    >> foreach ($line =~ m/<(-?\d+),(-?\d+)>/g) {
    >> print "$1 $2\n";
    >> }
    >>
    >> ... but for some reaosn that just printed the first pair of numbers
    >> over and over (the correct total amount, though). Do you have any
    >> idea why this would be so?

    >
    >Try "while" instead of "foreach".


    Ok, that works as I expected... but now I'm even more stumped -- could
    I have a language-lawyer explanation for the differences between these
    two cases? Hmm.. is it a context issue -- while apparently evaluates
    its condition expression in scalar context where I'd foreach uses
    list context? But still I seem to have slight problem in fully
    understanding why $1 and $2 are only set once in the foreach case
    (esp. that foreach does update $_ for each round through the loop).
    --
    Wolf a.k.a. Juha Laiho Espoo, Finland
    (GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
    PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
    "...cancel my subscription to the resurrection!" (Jim Morrison)
     
    Juha Laiho, Jul 16, 2004
    #12
  13. Juha Laiho wrote:
    > Gunnar Hjalmarsson <> said:
    >> Juha Laiho wrote:
    >>> I was for a while myself trying to get this to work with just
    >>>
    >>> foreach ($line =~ m/<(-?\d+),(-?\d+)>/g) {
    >>> print "$1 $2\n";
    >>> }
    >>>
    >>> ... but for some reaosn that just printed the first pair of
    >>> numbers over and over (the correct total amount, though). Do
    >>> you have any idea why this would be so?

    >>
    >> Try "while" instead of "foreach".

    >
    > Ok, that works as I expected... but now I'm even more stumped --
    > could I have a language-lawyer explanation for the differences
    > between these two cases?


    I think it is because foreach (or for) creates the whole list to loop
    over before the loop actually starts.

    > But still I seem to have slight problem in fully understanding why
    > $1 and $2 are only set once in the foreach case


    It's set multiple times - before the loop starts. Consequently, $1 and
    $2 will contain the values from the last time the regex matches (i.e.
    they contain the last pair of numbers, not the first pair as you said
    in another message).

    HTH

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jul 16, 2004
    #13
  14. Bill

    Paul Lalli Guest

    On Fri, 16 Jul 2004, Juha Laiho wrote:

    > Gunnar Hjalmarsson <> said:
    > >Juha Laiho wrote:
    > >> I was for a while myself trying to get this to work with just
    > >>
    > >> foreach ($line =~ m/<(-?\d+),(-?\d+)>/g) {
    > >> print "$1 $2\n";
    > >> }
    > >>
    > >> ... but for some reaosn that just printed the first pair of numbers
    > >> over and over (the correct total amount, though). Do you have any
    > >> idea why this would be so?

    > >
    > >Try "while" instead of "foreach".

    >
    > Ok, that works as I expected... but now I'm even more stumped -- could
    > I have a language-lawyer explanation for the differences between these
    > two cases? Hmm.. is it a context issue -- while apparently evaluates
    > its condition expression in scalar context where I'd foreach uses
    > list context? But still I seem to have slight problem in fully
    > understanding why $1 and $2 are only set once in the foreach case
    > (esp. that foreach does update $_ for each round through the loop).


    You are correct. The two syntaxes are:
    while (EXPR) { }
    and
    foreach SCALAR (LIST) { }

    Using a while, you are evaluating m//g in a scalar context. Each time
    through the loop, $1 and $2 are set to the captured sub patterns in that
    pattern match. The /g modifier remembers where the last one left off and
    starts the next match at that point.

    Using a foreach, the m//g is evaluated in list context exactly once. It
    is as though you had actually said:
    @matches = $line =~ m/<(-?\d+),(-?\d+)>/g
    foreach (@matches){
    print "$1 $2\n";
    }

    As you can see, the pattern match is only executed once. Therefore $1 and
    $2 are only set once - they are set to the captured parentheses that
    represent the first pattern match. In a list context, however, m//g
    returns all the parenthesized matches. So the foreach loop is still
    executed the number of times you expect it to be.

    Does that clear things up at all?

    Paul Lalli
     
    Paul Lalli, Jul 16, 2004
    #14
  15. Bill

    Paul Lalli Guest

    On Fri, 16 Jul 2004, Paul Lalli wrote:

    > Using a foreach, the m//g is evaluated in list context exactly once. It
    > is as though you had actually said:
    > @matches = $line =~ m/<(-?\d+),(-?\d+)>/g
    > foreach (@matches){
    > print "$1 $2\n";
    > }
    >
    > As you can see, the pattern match is only executed once. Therefore $1 and
    > $2 are only set once - they are set to the captured parentheses that
    > represent the first pattern match. In a list context, however, m//g
    > returns all the parenthesized matches. So the foreach loop is still
    > executed the number of times you expect it to be.


    Hrm. Gunnar's explanation (in another post to this thread) is correct.
    $1 and $2 get the last value matched, not the first. I misunderstood my
    own test case.

    Apologies,
    Paul Lalli
     
    Paul Lalli, Jul 16, 2004
    #15
  16. Bill

    Juha Laiho Guest

    [captions re-ordered a bit, to cut the message to a reasonable length]

    Gunnar Hjalmarsson <> said:
    >Juha Laiho wrote:
    >> But still I seem to have slight problem in fully understanding why
    >> $1 and $2 are only set once in the foreach case


    >I think it is because foreach (or for) creates the whole list to loop
    >over before the loop actually starts.

    ....
    >It's set multiple times - before the loop starts. Consequently, $1 and
    >$2 will contain the values from the last time the regex matches (i.e.
    >they contain the last pair of numbers, not the first pair as you said
    >in another message).


    Gunnar, thanks -- this does make sense. Rewriting my test case so that
    the first and last value pairs were different confirms what you write
    above (silly me!) -- $1 and $2 keep set to the values found with the
    last match.

    Also, this provides a very good insight for the rationale to have
    these two loop constructs that on seem so similar at first sight.

    Even though this isn't a FAQ as such, I think having a FAQ entry
    describing the differences in loop constructs in more detail might
    make sense (and yes, I know by making the suggestion I'm setting
    myself as the volunteer to provide the question and answer -- let's
    see whether I can accomplish that or not).

    And thanks also to Paul.
    --
    Wolf a.k.a. Juha Laiho Espoo, Finland
    (GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
    PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
    "...cancel my subscription to the resurrection!" (Jim Morrison)
     
    Juha Laiho, Jul 17, 2004
    #16
  17. Bill

    Joe Smith Guest

    Juha Laiho wrote:

    > foreach ($line =~ m/<(-?\d+),(-?\d+)>/g) {
    > print "$1 $2\n";
    > }


    In addition to everything else said in this thread, you should remember
    this: Never use $1 (and friends) without testing if the pattern matched.

    Bad:
    $line =~ m/(\d+),(\d+)/;
    print "found $1 and $2\n"; # Wrong data if match failed

    Better:
    if ($line =~ m/(\d+),(\d+)/) {
    print "found $1 and $2\n";
    } else {
    print "did not find a pair of numbers\n";
    }

    -Joe
     
    Joe Smith, Jul 17, 2004
    #17
  18. Juha Laiho wrote:
    > Even though this isn't a FAQ as such, I think having a FAQ entry
    > describing the differences in loop constructs in more detail might
    > make sense (and yes, I know by making the suggestion I'm setting
    > myself as the volunteer to provide the question and answer -- let's
    > see whether I can accomplish that or not).


    I'd like to encourage you to write a suggestion. Personally I have
    mixed up for and while many times, and the docs seem not to include
    clear and concise descriptions of their syntax (or have I missed
    something?), merely a few examples in "perldoc perlsyn".

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jul 17, 2004
    #18
  19. Bill

    Juha Laiho Guest

    Joe Smith <> said:
    >Juha Laiho wrote:
    >> foreach ($line =~ m/<(-?\d+),(-?\d+)>/g) {
    >> print "$1 $2\n";
    >> }

    >
    >In addition to everything else said in this thread, you should remember
    >this: Never use $1 (and friends) without testing if the pattern matched.
    >
    >Bad:
    > $line =~ m/(\d+),(\d+)/;

    ....
    >Better:
    > if ($line =~ m/(\d+),(\d+)/) {

    ....

    Yep,

    though in context of this thread of discussion, the testing is embedded
    into the loop construct: the loop body will not be executed if the
    pattern match fails.
    --
    Wolf a.k.a. Juha Laiho Espoo, Finland
    (GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
    PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
    "...cancel my subscription to the resurrection!" (Jim Morrison)
     
    Juha Laiho, Jul 17, 2004
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    474
  2. Carlos Ribeiro
    Replies:
    11
    Views:
    712
    Alex Martelli
    Sep 17, 2004
  3. trans.  (T. Onoma)

    split on '' (and another for split -1)

    trans. (T. Onoma), Dec 27, 2004, in forum: Ruby
    Replies:
    10
    Views:
    225
    Florian Gross
    Dec 28, 2004
  4. Sam Kong
    Replies:
    5
    Views:
    254
    Rick DeNatale
    Aug 12, 2006
  5. Stanley Xu
    Replies:
    2
    Views:
    636
    Stanley Xu
    Mar 23, 2011
Loading...

Share This Page