Newbie needs help with split() and "<"

B

Bill

I am trying to split the following line into a list of just the
numbers. It is a list of xy coordinates.

<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>

I can use split() with comma, and ">", but not "<". The following
code works, but I can not add "<" to the regular expression used by
split(). I have tried various combinations of "\<" with and without
quotes without success. Any ideas?
Thanks.


$tmpline = "<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>";
(@grphdata) = split(/[\,>]/,$tmpline);
print $tmpline . "\n";
$i2 = 0;
while ($grphdata[$i2]){
print $i2 . " " . $grphdata[$i2] . "\n";
$i2++;
}
 
G

Gunnar Hjalmarsson

Bill said:
I am trying to split the following line into a list of just the
numbers. It is a list of xy coordinates.

<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>

I can use split() with comma, and ">", but not "<". The following
code works, but I can not add "<" to the regular expression used by
split(). I have tried various combinations of "\<" with and
without quotes without success. Any ideas?

Since it's easier to tell what it is you want than what it is you do
not want, you'd better use the m// operator instead of split().

push @grphdata, $1 while $tmpline =~ /(-?\d+)/g;
 
G

gmax

Bill said:
I am trying to split the following line into a list of just the
numbers. It is a list of xy coordinates.

<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>

I can use split() with comma, and ">", but not "<". The following
code works, but I can not add "<" to the regular expression used by
split(). I have tried various combinations of "\<" with and without
quotes without success. Any ideas?
Thanks.


$tmpline = "<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>";
(@grphdata) = split(/[\,>]/,$tmpline);
print $tmpline . "\n";
$i2 = 0;
while ($grphdata[$i2]){
print $i2 . " " . $grphdata[$i2] . "\n";
$i2++;
}

split *is* working. It's the test in your while loop that is faulty :)
If you use "<" as a separator, the first item will be an empty string,
i.e. the empty string before the initial "<".
See

perldoc -f split


#!/usr/bin/perl -w
use strict;
use Data::Dumper;

my $tmpline = "<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>";
my @grphdata = split(/[,>< ]+/,$tmpline);
print $tmpline . "\n";

print Dumper \@grphdata;

my $i2 = 0;
while ($grphdata[$i2]){
print $i2 . " " . $grphdata[$i2] . "\n";
$i2++;
}

__END__
output:
$VAR1 = [
'', # this will evaluate as FALSE
'-250',
'-850',
'-250',
'800',
'200',
'800',
'200',
'-850',
'-250',
'-850',
''
];


If you want to avoid the empty items, use
@grphdata = grep { $_ } split(/[,>< ]+/,$tmpline);


HTH

gmax


--
____ ____ _____ _ _
/ _ | \(____ ( \ / )
( (_| | | | / ___ |) X (
\___ |_|_|_\_____(_/ \_)
(_____|
Sapere, saper fare, fare, far sapere
http://gmax.oltrelinux.com
 
P

Paul Lalli

I am trying to split the following line into a list of just the
numbers. It is a list of xy coordinates.

<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>

I can use split() with comma, and ">", but not "<". The following
code works, but I can not add "<" to the regular expression used by
split(). I have tried various combinations of "\<" with and without
quotes without success. Any ideas?
Thanks.

Here's some very basic advice. Use split() when you know exactly what you
want to throw away. Use m// when you know exactly what you want to keep.
In this case, it's far easier to define what you want to keep:
$tmpline = "<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>";
(@grphdata) = split(/[\,>]/,$tmpline);

@grphdata = m/(-?\d+)/g;

print $tmpline . "\n";
$i2 = 0;
while ($grphdata[$i2]){
print $i2 . " " . $grphdata[$i2] . "\n";
$i2++;
}

That messy while loop can be better written either of these ways:

print join(' ', @grphdata), "\n";

{
local $" = ' '; #usually not necessary, as ' ' is the default.
print "@grphdata\n";
}


Paul Lalli
 
A

A. Sinan Unur

(e-mail address removed) (Bill) wrote in @posting.google.com:
I am trying to split the following line into a list of just the
numbers. It is a list of xy coordinates.

<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>

I can use split() with comma, and ">", but not "<". The following
code works, but I can not add "<" to the regular expression used by
split(). I have tried various combinations of "\<" with and without
quotes without success. Any ideas?

Splitting on the comman is the wrong thing to do here. What you need here
is to extract the bracketed coordinates. The right module for this purpose
is Text::Balanced. (Incidentally, try the following Google search and see
what comes up: http://www.google.com/search?q=perl+extract+bracketed).
$tmpline = "<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>";
(@grphdata) = split(/[\,>]/,$tmpline);
print $tmpline . "\n";
$i2 = 0;
while ($grphdata[$i2]){
print $i2 . " " . $grphdata[$i2] . "\n";
$i2++;
}

Ugly :(

#! perl

use strict;
use warnings;

use Text::Balanced qw(extract_bracketed);

my @points;

while(my $line = <DATA>) {
while(my $next = extract_bracketed $line, '<>') {
if($next =~ /^<(-?\d+),(-?\d+)>$/) {
push @points, { x => $1, y => $2 };
}
}
}

use Data::Dumper;
print Dumper \@points;

__DATA__
<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>
<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>

C:\Home> test
$VAR1 = [
{
'y' => '-850',
'x' => '-250'
},
{
'y' => '800',
'x' => '-250'
},
{
'y' => '800',
'x' => '200'
},
{
'y' => '-850',
'x' => '200'
},
{
'y' => '-850',
'x' => '-250'
},
{
'y' => '-850',
'x' => '-250'
},
{
'y' => '800',
'x' => '-250'
},
{
'y' => '800',
'x' => '200'
},
{
'y' => '-850',
'x' => '200'
},
{
'y' => '-850',
'x' => '-250'
},
];
 
J

John W. Krahn

Gunnar said:
Since it's easier to tell what it is you want than what it is you do
not want, you'd better use the m// operator instead of split().

push @grphdata, $1 while $tmpline =~ /(-?\d+)/g;

Or even simply:

my @grphdata = $tmpline =~ /-?\d+/g;


John
 
P

Paul Lalli

$tmpline = "<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>";
(@grphdata) = split(/[\,>]/,$tmpline);

@grphdata = m/(-?\d+)/g;

Whoops. This assumes the data strings is in $_. If it's not, you need to
type a little bit more:

@grphdata = $tmpline =~ m/(-?\d+)/g;

Paul Lally
 
S

Scott W Gifford

I am trying to split the following line into a list of just the
numbers. It is a list of xy coordinates.

<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>

I can use split() with comma, and ">", but not "<". The following
code works, but I can not add "<" to the regular expression used by
split(). I have tried various combinations of "\<" with and without
quotes without success. Any ideas?
Thanks.


$tmpline = "<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>";
(@grphdata) = split(/[\,>]/,$tmpline);

This does what I think you mean:

(@grphdata) = split(/[\,>< ]+/,$tmpline);

Without a space in the character class and without a + afterwards,
you'd be getting an extra element consisting of a single space between
each of the pairs. The space includes the space character, and the +
gobbles up the whole thing.

-----ScottG.
 
B

Bill

I am trying to split the following line into a list of just the
numbers. It is a list of xy coordinates.

<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>

I can use split() with comma, and ">", but not "<". The following
code works, but I can not add "<" to the regular expression used by
split(). I have tried various combinations of "\<" with and without
quotes without success. Any ideas?
Thanks.


$tmpline = "<-250,-850> <-250,800> <200,800> <200,-850> <-250,-850>";
(@grphdata) = split(/[\,>]/,$tmpline);
print $tmpline . "\n";
$i2 = 0;
while ($grphdata[$i2]){
print $i2 . " " . $grphdata[$i2] . "\n";
$i2++;
}

wow… great replies everyone. I have learned a lot just now. Thanks.
I thought the problem was with the "<", and trying to search using
"<", was driving me nuts.

Sorry about the messy code, just pieces I cut out of the script to
test the split function, I will clean it up, I promise :)
 
J

Juha Laiho

A. Sinan Unur said:
(e-mail address removed) (Bill) wrote in ....
Splitting on the comman is the wrong thing to do here. What you need here
is to extract the bracketed coordinates. The right module for this purpose
is Text::Balanced. (Incidentally, try the following Google search and see
what comes up: http://www.google.com/search?q=perl+extract+bracketed). ....
#! perl

use strict;
use warnings;

use Text::Balanced qw(extract_bracketed);

my @points;

while(my $line = <DATA>) {
while(my $next = extract_bracketed $line, '<>') {
if($next =~ /^<(-?\d+),(-?\d+)>$/) {
push @points, { x => $1, y => $2 };
}
}
}
....

Interesting. I was for a while myself trying to get this to work with just

foreach ($line =~ m/<(-?\d+),(-?\d+)>/g) {
print "$1 $2\n";
}

.... but for some reaosn that just printed the first pair of numbers over
and over (the correct total amount, though). Do you have any idea why this
would be so? Still in the above $_ is updated correctly for each match,
but $1 and $2 stay set to the first pair of numbers. I ended up with

foreach ($line =~ m/<.*?>/g) {
m/<(-?\d+),(-?\d+)>/;
print "$1 $2\n";
}

where $1 and $2 get updated as I expected.
 
G

Gunnar Hjalmarsson

Juha said:
I was for a while myself trying to get this to work with just

foreach ($line =~ m/<(-?\d+),(-?\d+)>/g) {
print "$1 $2\n";
}

... but for some reaosn that just printed the first pair of numbers
over and over (the correct total amount, though). Do you have any
idea why this would be so?

Try "while" instead of "foreach".
 
J

Juha Laiho

Gunnar Hjalmarsson said:
Try "while" instead of "foreach".

Ok, that works as I expected... but now I'm even more stumped -- could
I have a language-lawyer explanation for the differences between these
two cases? Hmm.. is it a context issue -- while apparently evaluates
its condition expression in scalar context where I'd foreach uses
list context? But still I seem to have slight problem in fully
understanding why $1 and $2 are only set once in the foreach case
(esp. that foreach does update $_ for each round through the loop).
 
G

Gunnar Hjalmarsson

Juha said:
Ok, that works as I expected... but now I'm even more stumped --
could I have a language-lawyer explanation for the differences
between these two cases?

I think it is because foreach (or for) creates the whole list to loop
over before the loop actually starts.
But still I seem to have slight problem in fully understanding why
$1 and $2 are only set once in the foreach case

It's set multiple times - before the loop starts. Consequently, $1 and
$2 will contain the values from the last time the regex matches (i.e.
they contain the last pair of numbers, not the first pair as you said
in another message).

HTH
 
P

Paul Lalli

Ok, that works as I expected... but now I'm even more stumped -- could
I have a language-lawyer explanation for the differences between these
two cases? Hmm.. is it a context issue -- while apparently evaluates
its condition expression in scalar context where I'd foreach uses
list context? But still I seem to have slight problem in fully
understanding why $1 and $2 are only set once in the foreach case
(esp. that foreach does update $_ for each round through the loop).

You are correct. The two syntaxes are:
while (EXPR) { }
and
foreach SCALAR (LIST) { }

Using a while, you are evaluating m//g in a scalar context. Each time
through the loop, $1 and $2 are set to the captured sub patterns in that
pattern match. The /g modifier remembers where the last one left off and
starts the next match at that point.

Using a foreach, the m//g is evaluated in list context exactly once. It
is as though you had actually said:
@matches = $line =~ m/<(-?\d+),(-?\d+)>/g
foreach (@matches){
print "$1 $2\n";
}

As you can see, the pattern match is only executed once. Therefore $1 and
$2 are only set once - they are set to the captured parentheses that
represent the first pattern match. In a list context, however, m//g
returns all the parenthesized matches. So the foreach loop is still
executed the number of times you expect it to be.

Does that clear things up at all?

Paul Lalli
 
P

Paul Lalli

Using a foreach, the m//g is evaluated in list context exactly once. It
is as though you had actually said:
@matches = $line =~ m/<(-?\d+),(-?\d+)>/g
foreach (@matches){
print "$1 $2\n";
}

As you can see, the pattern match is only executed once. Therefore $1 and
$2 are only set once - they are set to the captured parentheses that
represent the first pattern match. In a list context, however, m//g
returns all the parenthesized matches. So the foreach loop is still
executed the number of times you expect it to be.

Hrm. Gunnar's explanation (in another post to this thread) is correct.
$1 and $2 get the last value matched, not the first. I misunderstood my
own test case.

Apologies,
Paul Lalli
 
J

Juha Laiho

[captions re-ordered a bit, to cut the message to a reasonable length]

I think it is because foreach (or for) creates the whole list to loop
over before the loop actually starts. ....
It's set multiple times - before the loop starts. Consequently, $1 and
$2 will contain the values from the last time the regex matches (i.e.
they contain the last pair of numbers, not the first pair as you said
in another message).

Gunnar, thanks -- this does make sense. Rewriting my test case so that
the first and last value pairs were different confirms what you write
above (silly me!) -- $1 and $2 keep set to the values found with the
last match.

Also, this provides a very good insight for the rationale to have
these two loop constructs that on seem so similar at first sight.

Even though this isn't a FAQ as such, I think having a FAQ entry
describing the differences in loop constructs in more detail might
make sense (and yes, I know by making the suggestion I'm setting
myself as the volunteer to provide the question and answer -- let's
see whether I can accomplish that or not).

And thanks also to Paul.
 
J

Joe Smith

Juha said:
foreach ($line =~ m/<(-?\d+),(-?\d+)>/g) {
print "$1 $2\n";
}

In addition to everything else said in this thread, you should remember
this: Never use $1 (and friends) without testing if the pattern matched.

Bad:
$line =~ m/(\d+),(\d+)/;
print "found $1 and $2\n"; # Wrong data if match failed

Better:
if ($line =~ m/(\d+),(\d+)/) {
print "found $1 and $2\n";
} else {
print "did not find a pair of numbers\n";
}

-Joe
 
G

Gunnar Hjalmarsson

Juha said:
Even though this isn't a FAQ as such, I think having a FAQ entry
describing the differences in loop constructs in more detail might
make sense (and yes, I know by making the suggestion I'm setting
myself as the volunteer to provide the question and answer -- let's
see whether I can accomplish that or not).

I'd like to encourage you to write a suggestion. Personally I have
mixed up for and while many times, and the docs seem not to include
clear and concise descriptions of their syntax (or have I missed
something?), merely a few examples in "perldoc perlsyn".
 
J

Juha Laiho

Joe Smith said:
In addition to everything else said in this thread, you should remember
this: Never use $1 (and friends) without testing if the pattern matched.

Bad:
$line =~ m/(\d+),(\d+)/; ....
Better:
if ($line =~ m/(\d+),(\d+)/) {
....

Yep,

though in context of this thread of discussion, the testing is embedded
into the loop construct: the loop body will not be executed if the
pattern match fails.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top