Parsing indented text file

P

P

Hi,

I have an input file which looks like this:


aaaaaaa1
bb1
ccc1
ccc2
ccc3
bb2
ccc4
ccc5
ccc6
aaaaaaa2
bb3
ccc7
ccc8
ccc9
bb4
ccc10
ccc11
ccc12
bb5
ccc13
ccc14
ccc15
bb6
ccc16
ccc17
ccc18
aaaaaaa3
....

and so on. The letters are the same for the same indent
level. They don't mean that the data there is the same. I am
trying to do the following: as long as the indent level gets
deeper, I want to combine the lines into strings like this:

aaaaaaaa1/bb1/ccc1
aaaaaaaa1/bb1/ccc2
aaaaaaaa1/bb1/ccc3
aaaaaaaa1/bb2/ccc4
....
aaaaaaaa2/bb3/ccc7

and so on.

I have tried the following:

+start_of_code

#!/usr/bin/perl
use warnings;
use strict;

my $prev_indent = -1;

my $curr_wanted_string = '';

while (<DATA>) {
chomp;

my $curr_indent = length $1 if m/^(\s*)/;

s/^\s*|\s*$//;


if ( $curr_indent > $prev_indent ) {
$curr_wanted_string .= "$_/";
$prev_indent = $curr_indent;
}
else {
print "[$curr_wanted_string]\n";
$curr_wanted_string = '';
}
}


__DATA__
The above input here (didn't want to make post too long).

- end_of_code


The result, though, is that I get only the first string and
then many empty strings:

[aaaaaaa1/bb1/ccc1/]
[]
[]
[]
....


Can you please help me fix this?


Thank you,
Angie
 
M

Mumia W.

Hi,

I have an input file which looks like this:


aaaaaaa1
bb1
ccc1
ccc2
ccc3
bb2
ccc4
ccc5
ccc6
aaaaaaa2
bb3
ccc7
ccc8
ccc9
bb4
ccc10
ccc11
ccc12
bb5
ccc13
ccc14
ccc15
bb6
ccc16
ccc17
ccc18
aaaaaaa3
....

and so on. The letters are the same for the same indent
level. They don't mean that the data there is the same. I am
trying to do the following: as long as the indent level gets
deeper, I want to combine the lines into strings like this:

aaaaaaaa1/bb1/ccc1
aaaaaaaa1/bb1/ccc2
aaaaaaaa1/bb1/ccc3
aaaaaaaa1/bb2/ccc4
....
aaaaaaaa2/bb3/ccc7

and so on.

I have tried the following:

+start_of_code

#!/usr/bin/perl
use warnings;
use strict;

my $prev_indent = -1;

my $curr_wanted_string = '';

while (<DATA>) {
chomp;

my $curr_indent = length $1 if m/^(\s*)/;

s/^\s*|\s*$//;


if ( $curr_indent > $prev_indent ) {
$curr_wanted_string .= "$_/";
$prev_indent = $curr_indent;
}
else {
print "[$curr_wanted_string]\n";
$curr_wanted_string = '';
}
}


__DATA__
The above input here (didn't want to make post too long).

- end_of_code


The result, though, is that I get only the first string and
then many empty strings:

[aaaaaaa1/bb1/ccc1/]
[]
[]
[]
....


Can you please help me fix this?


Thank you,
Angie

Ultimately, you want to print a string consisting of three
parts separated by slashes:

print "$part1/$part2/$part3\n";

To do this, you need to set up the three parts before
printing. You'll use a while loop to look at the lines in the
data, so you'll only see one part at a time which means that
you have to remember that part when you see it (store it in
part#).

Let's start with the while loop:

while (<DATA>) {
chomp;
}

You only want to print when you see something that at the
second indent-level, e.g. "ccc1," so you should find out the
current indent level:

while (<DATA>) {
chomp;
my $indent_level = 0;
if (m/^( +)/) {
$indent_level = length($1) / 2;
}
print "$indent_level: $_\n";
}

Now that you know the indent level, you can know which part
each line belongs to:

while (<DATA>) {
chomp;
my $indent_level = 0;
if (m/^( +)/) {
$indent_level = length($1) / 2;
}
s/^\s*|\s*$//;
if (0 == $indent_level) { $part1 = $_ }
if (1 == $indent_level) { $part2 = $_ }
if (2 == $indent_level) { $part3 = $_ }
}

You only want to print when you've gotten to the last indent
level, e.g. "ccc3":

while (<DATA>) {
chomp;
my $indent_level = 0;
if (m/^( +)/) {
$indent_level = length($1) / 2;
}
s/^\s*|\s*$//;
if (0 == $indent_level) { $part1 = $_ }
if (1 == $indent_level) { $part2 = $_ }
if (2 == $indent_level) { $part3 = $_ }
print "$part1/$part2/$part3\n" if ($indent_level == 2);
}

__TA_DA__
:)

This is not the completed program. Naturally, you'd enable
strictures and warning first, and you'd have to define $part1,
$part2 and $part3 before the while loop.

HTH
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top