Should be a simple parsing problem

T

Tim

Hello all, I am trying to get a simple Perl script working to
transform some data, but have an incredibly onerous solution that
looks more like visual basic than perl using lots of conditional
statements, while loops and the shift function. Something inside
doesn't feel right about that, plus I think I am missing out on
expanding my limited Perl knowledge.

My data is in the general form:
_________________________
TEXT {
title: test
font: script
}

POLYGON {
name: foo
type: good
POINTS 3 {
4,3
2,16
633,2
}
}

JUNK {
title: nothing
}

POLYGON {
name: foo2
type: bad
POINTS 2 {
7,9
3,2
}

}

Now I want to extract the points where the polygon type is 'good' so
my
output would be:
4,3
2,16
633,2

I can get Text::Balanced to work on a single line, but don't know how
to elegantly parse down to the fields I need. Any thoughts would be
greatly appreciated. Not committed to Text::Balanced, but it seems
like it should work.

best,

Tim
 
M

Martijn Lievaart

Hello all, I am trying to get a simple Perl script working to transform
some data, but have an incredibly onerous solution that looks more like
visual basic than perl using lots of conditional statements, while loops
and the shift function. Something inside doesn't feel right about that,
plus I think I am missing out on expanding my limited Perl knowledge.

My data is in the general form:
_________________________
TEXT {
title: test
font: script
}

POLYGON {
name: foo
type: good
POINTS 3 {
4,3
2,16
633,2
}
}

JUNK {
title: nothing
}

POLYGON {
name: foo2
type: bad
POINTS 2 {
7,9
3,2
}

}

Now I want to extract the points where the polygon type is 'good' so my
output would be:
4,3
2,16
633,2

I can get Text::Balanced to work on a single line, but don't know how to
elegantly parse down to the fields I need. Any thoughts would be greatly
appreciated. Not committed to Text::Balanced, but it seems like it
should work.

OTTOMH:

while (<>) {
/^POLYGON\s+{/ and do {
while (<>) {
/\stype:\sgood\s*$/ {
//handle point here in the same way
/^}\s*$/ and last;
}
/^}\s*$/ and last;
}
}
}

This obviously assumes your input is always formatted in the same way, is
always correct and type: comes before the POINT.

HTH
M4
 
T

Tad McClellan

Tim said:
Now I want to extract the points where the polygon type is 'good' so
my
output would be:
4,3
2,16
633,2


--------------------------
#!/usr/bin/perl
use warnings;
use strict;
use Text::Balanced 'extract_bracketed';

local $/ = ''; # enable paragraph mode, see perlvar.pod

while ( <DATA> ) {
next unless /type: good.*POINTS\s+\d+/gs;
my $bracketed = extract_bracketed();
print "$bracketed\n";
}


__DATA__
TEXT {
title: test
font: script
}

POLYGON {
name: foo
type: good
POINTS 3 {
4,3
2,16
633,2
}
}

JUNK {
title: nothing
}

POLYGON {
name: foo2
type: bad
POINTS 2 {
7,9
3,2
}

}
 
T

Tim

OTTOMH:

while (<>) {
/^POLYGON\s+{/ and do {
while (<>) {
/\stype:\sgood\s*$/ {
//handle point here in the same way
/^}\s*$/ and last;
}
/^}\s*$/ and last;
}
}

}

This obviously assumes your input is always formatted in the same way, is
always correct and type: comes before the POINT.

HTH
M4

This is great, and very instructive. Thank you so much, so what I
wasn't understanding is how you can nest while(<>) statements. I have
to think a bit more about what is really going on there, but this is
what I was looking for: code that works more in line with how I think
instead of going line-by line and doing careful book-keeping. I also
need to look up the 'and last' statement and see what that is doing.
Thanks again.

Tim
 
M

Martijn Lievaart

This is great, and very instructive. Thank you so much, so what I wasn't
understanding is how you can nest while(<>) statements. I have to think
a bit more about what is really going on there, but this is what I was
looking for: code that works more in line with how I think instead of
going line-by line and doing careful book-keeping. I also need to look
up the 'and last' statement and see what that is doing. Thanks again.

I like Tads solution much better, but fwiw:

- Yes you can nest while (<>) like this. Normally it is more trouble than
it's worth, but in this case it is appropriate. Just remember that you
are reading the same file with the same filepointer so if you get to
another while (<>) (or back to) that reads on where the last one left of.

- The "<condition> and last;" construct is another way of saying "if
(<condition>) { last; }",, only shorter. Often seen like this:

# parse config file
while (<$fh>) {
/^\s*$/ and continue; # skip empty lines
/^\s*#/ and continue; # skip comments
....
}

HTH,
M4
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top