Matching block of text awk-like: /---/,/---/

A

A. Farber

Hello,

I'd like to extract few lines of input,
delimited by the "-----" lines, i.e.

---------------
blah
bleh
blue
---------------

Is there a nice way in Perl to do that,
maybe awk-like /^---/,/^---/ ?

I have forgotten how to do it nicely in Perl,
and it's difficult to find this case in docs/Google

Thanks
Alex
 
A

A. Farber

Here is how I've done it sofar, but I still wonder
if /^---/,/^---/ or alike is supported in Perl:


#!C:\Perl\bin\perl.exe -w

use strict;
use Data::Dumper;

my $CMD = 'javaloader dir';
my %cods;
my $seen;

open my $pipe, "$CMD |" or die "Can't run $CMD: $!";
while (<$pipe>) {
$seen = !$seen if /^-----/;

$cods{lc $1} = lc $2 if $seen && /^(\w+)\s+(.+)$/;
}
close $pipe or die "Can't close $CMD: $!";

print Dumper(\%cods);
 
U

Uri Guttman

AF> Hello,
AF> I'd like to extract few lines of input,
AF> delimited by the "-----" lines, i.e.

AF> ---------------
AF> blah
AF> bleh
AF> blue
AF> ---------------

AF> Is there a nice way in Perl to do that,
AF> maybe awk-like /^---/,/^---/ ?

AF> I have forgotten how to do it nicely in Perl,
AF> and it's difficult to find this case in docs/Google

use the range operator .. in a scalar context. this is also known as the
flip flop (or bistable) operator. it generally behaves like awk's range
of patterns but is more general purpose and isn't tied to just a pair of
patterns and lines like awk.

but depending on the markers and the file size, these days i prefer to
slurp in the file, match the delimiters and grab the content in a single
regex (awk could never do this easily). it is generally fast than line
by line match/grab with range and it is much simpler code. here is rough
untested code:

use File::Slurp ;

my $text = read_file( $file_name ) ;

# you need /m to allow ^ and $ to match line stuff inside the string
# you need /s to allow . to match \n in the grabbed chunk assuming it is
# multiple lines of text as you show

while( $text =~ /^-+$(.+?)^-+$/msg ) {

process_stuff( $1 ) ;
}

as i said, that is simpler, faster and easier code than range can
do. and awk can't go near it.

uri
 
U

Uri Guttman

AF> Here is how I've done it sofar, but I still wonder
AF> if /^---/,/^---/ or alike is supported in Perl:


AF> while (<$pipe>) {
AF> $seen = !$seen if /^-----/;

AF> $cods{lc $1} = lc $2 if $seen && /^(\w+)\s+(.+)$/;
AF> }
AF> close $pipe or die "Can't close $CMD: $!";

you just reinvented the scalar range operator. see my other post.

uri
 
P

Peter Makholm

A. Farber said:
Here is how I've done it sofar, but I still wonder
if /^---/,/^---/ or alike is supported in Perl:

You want to look at the range operators, probably in the three dots
version. look in 'perldoc perlop' for the 'Range operators' headline.

//Makholm
 
J

jl_post

I'd like to extract few lines of input,
delimited by the "-----" lines, i.e.


Use the "..." range operator, like this:

perl -lne "print if /^---/ ... /^---/"

This will print out the "---" lines, though. If you don't want to
print those, you can exclude them with this:

perl -lne "print if /^---/ ... /^---/ and not /^---/"

but make sure that the "not /^---/" part comes AFTER the "/^---/ ... /
^---/" part, or else short-circuit evaluation will prevent the "/
^---/ ... /^---/" part from ever being evaluated, which will result in
the one-liner not working as you intended.

I hope this helps, A. Farber.

-- Jean-Luc
 
P

Peter Makholm

This will print out the "---" lines, though. If you don't want to
print those, you can exclude them with this:

perl -lne "print if /^---/ ... /^---/ and not /^---/"

but make sure that the "not /^---/" part comes AFTER the "/^---/ ... /
^---/" part,

And note that you have touse the low-precedence 'and' operator instead
of the higher precedence && operator. Otherwise it will be parsed as
part of the second portion of the range operator which will never be
true.

//Makholm
 
U

Uri Guttman

jp> Use the "..." range operator, like this:

you don't need the ... as .. will do fine. he can't match the left and
right sides of .. on the same line as he has two different lines for
delimiting.

jp> perl -lne "print if /^---/ ... /^---/ and not /^---/"

and if that is all he really needs, then he can just skip all --- lines
with a simpler unix grep:

foo | grep -v '----'

OP: is there stuff to ignore between line groups delimited by ---? do
you need to do anything other than print them (as in process them with
more code)?

uri
 
J

jl_post

jp" == jl post said:
jp> Use the "..." range operator, like this:

jp> perl -lne "print if /^---/ ... /^---/"

you don't need the ... as .. will do fine. he can't match the left and
right sides of .. on the same line as he has two different lines for
delimiting.

I'm not sure I follow you, Uri. According to the original post,
the delimiting lines are identical. Therefore, if '..' is used, the
range operator will "flip and flop" on the same line, never showing
the lines between the delimiting lines.

To make sure he captures the lines between the '---' lines, the
original poster should use the '...' operator, since according to
"perldoc perlop", the '...' operator won't test the right operand
until the next operation (as in sed).

To illsutrate, if I run the command:

perl -E "say '---'; say foreach 10,11,12; say '---'"

it gives me this output:

---
10
11
12
---

If I pipe it through the command with the '..' range operator, like
this:

perl -E "say '---'; say foreach 10,11,12; say '---'" | perl -lne
"print if /^---/ .. /^---/"

the output is:

---
---

showing us that the range operator flipped and flopped on the same
line, never printing anything but the "---" lines.

But if we replace the '..' operator with '...', in this command:

perl -E "say '---'; say foreach 10,11,12; say '---'" | perl -lne
"print if /^---/ ... /^---/"

we get this output:
 
U

Uri Guttman

jp> perl -lne "print if /^---/ ... /^---/"


jp> I'm not sure I follow you, Uri. According to the original post,
jp> the delimiting lines are identical. Therefore, if '..' is used, the
jp> range operator will "flip and flop" on the same line, never showing
jp> the lines between the delimiting lines.

maybe i flip flopped on which op tests the same lines. i haven't used
them in a good while as i use the slurp/extract technique when i need to
do this.

jp> So in this case, it does matter whether '..' or '...' is used.

seems like it. the slurp/extract technique is still faster and better. i
have used the flip flop many times in the distant past but not in a good
while.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top