Converting indented data to a tree

T

Tore Aursand

Hi!

I need to convert a large text file with indented data into a tree (ie.
into a child-parent relationship).

The text file looks like this:

Page 1
Page 1.1
Page 1.1.1
Page 1.1.2
Page 1.2
Page 1.3
Page 2
Page 2.1

I know that for each "level", there are a known number of spaces (in this
case 4), or 0 spaces if we're on a root level.

For each line I want to assign an incremental counter, so that I should
end up with an array of arrays representing the tree. For the example
above the array would look like this:

@array = (
[1,0], # Page 1
[2,1], # Page 1.1
[3,2], # Page 1.1.1
[4,2], # Page 1.1.2
[5,1], # Page 1.2
[6,1], # Page 1.3
[7,0], # Page 2
[8,7], # Page 2.1
);

I guess you'll get the idea. I'm totally stuck on this one, and I would
like some help from you guys (and girls).

Thanks in advance!
 
A

Anno Siegel

Tore Aursand said:
Hi!

I need to convert a large text file with indented data into a tree (ie.
into a child-parent relationship).

The text file looks like this:

Page 1
Page 1.1
Page 1.1.1
Page 1.1.2
Page 1.2
Page 1.3
Page 2
Page 2.1

I know that for each "level", there are a known number of spaces (in this
case 4), or 0 spaces if we're on a root level.

For each line I want to assign an incremental counter, so that I should
end up with an array of arrays representing the tree. For the example
above the array would look like this:

@array = (
[1,0], # Page 1
[2,1], # Page 1.1
[3,2], # Page 1.1.1
[4,2], # Page 1.1.2
[5,1], # Page 1.2
[6,1], # Page 1.3
[7,0], # Page 2
[8,7], # Page 2.1
);

I guess you'll get the idea.

I thought I did, until I sw the last item. Should "[8,7]" be "[8,1]"?
Also, the first component seems to be nothing but a counter, one more
than the index. What is it for?
I'm totally stuck on this one, and I would
like some help from you guys (and girls).

The array is almost trivial to build:

my @array;
my $count = 1;
while ( <DATA> ) {
my ( $leading) = /^( *)/;
push @array, [ $count++, length( $leading)/4];
}

I don't see how the array represents the tree. It only holds the
level of each node, but not its relation to other nodes.

Anno
 
T

Tore Aursand

@array = (
[1,0], # Page 1
[2,1], # Page 1.1
[3,2], # Page 1.1.1
[4,2], # Page 1.1.2
[5,1], # Page 1.2
[6,1], # Page 1.3
[7,0], # Page 2
[8,7], # Page 2.1
);
I thought I did, until I sw the last item. Should "[8,7]" be "[8,1]"?
Also, the first component seems to be nothing but a counter, one more
than the index. What is it for?

*trying to figure out what's Anno is thinking* Ah! No, no. Guess I
should have explained it a little better.

The "innermost" array is a [id, parent_id] thing. While the 'id' can be a
simple counter (incremental), I need to keep track of the relationship to
the parent also.

While writing this message I got the idea of push'ing and pop'ing the
current parent '$count' onto/from an array, and it _seems to_ work;

my @array = ();
my $this_level = 0;
my $prev_level = 0;
my @parents = (0);
my $count = 0;

while ( <DATA> ) {
chomp;
next unless ( length );
$count++;

my ( $spaces ) = m,^( *),;
$this_level = length( $spaces ) / 4;

if ( $this_level > $prev_level ) {
push( @parents, $count );
}
elsif ( $this_level < $prev_level ) {
pop( @parents );
}

push( @array, [$count, $parents[-1]] );
$prev_level = $this_level;
}

Anyone for a better solution?
 
S

Steven Kuo

@array = (
[1,0], # Page 1
[2,1], # Page 1.1
[3,2], # Page 1.1.1
[4,2], # Page 1.1.2
[5,1], # Page 1.2
[6,1], # Page 1.3
[7,0], # Page 2
[8,7], # Page 2.1
);
I thought I did, until I sw the last item. Should "[8,7]" be "[8,1]"?
Also, the first component seems to be nothing but a counter, one more
than the index. What is it for?

*trying to figure out what's Anno is thinking* Ah! No, no. Guess I
should have explained it a little better.

The "innermost" array is a [id, parent_id] thing. While the 'id' can be a
simple counter (incremental), I need to keep track of the relationship to
the parent also.

(snipped)



Perhaps something like:

use Data::Dumper;

my %parent;
my @array;

while (<DATA>) {
chomp;
my $leading_spaces = 0;
++$leading_spaces while (/\G /g);
$leading_spaces /= 4;
if ($leading_spaces) {
push @array, [ $., $parent{$leading_spaces-1}];
} else {
push @array, [$., 0];
}

$parent{$leading_spaces} = $.;

}

print Dumper(\@array);

__DATA__
Page 1
Page 1.1
Page 1.1.1
Page 1.1.2
Page 1.2
Page 1.3
Page 2
Page 2.1
 
J

Jay Tilton

: The "innermost" array is a [id, parent_id] thing. While the 'id' can be a
: simple counter (incremental), I need to keep track of the relationship to
: the parent also.
:
: While writing this message I got the idea of push'ing and pop'ing the
: current parent '$count' onto/from an array, and it _seems to_ work;
:
: my @array = ();
: my $this_level = 0;
: my $prev_level = 0;
: my @parents = (0);
: my $count = 0;
:
: while ( <DATA> ) {
: chomp;
: next unless ( length );
: $count++;
:
: my ( $spaces ) = m,^( *),;
: $this_level = length( $spaces ) / 4;
:
: if ( $this_level > $prev_level ) {
: push( @parents, $count );
^^^^^^
Not really what you're after. The resulting data structure will end up
pointing to the parent node's first child instead of pointing to the
parent node itself.

push( @parents, $count-1 );

: }
: elsif ( $this_level < $prev_level ) {
: pop( @parents );

It's possible for $this_level to drop more than one from $prev_level.
If your data went like this:

Page 1
Page 1.1
Page 1.1.1
Page 1.1.2
Page 2

then the tree outline would be thrown out of alignment.
splice() will work better than pop() :

splice @parents, $this_level - $prev_level;

: }
:
: push( @array, [$count, $parents[-1]] );
: $prev_level = $this_level;
: }
:
: Anyone for a better solution?

Use the level to index the @parents array directly instead of
adding/removing elements at its end.

my @parents = 0;
my @array;
while(<DATA>) {
my $depth = length( (/^( *)/)[0] ) / 4;
$parents[$depth+1] = $. ;
push @array, [ $. , $parents[$depth] ];
}
 
T

Tore Aursand

if ( $this_level > $prev_level ) {
push( @parents, $count );
}
elsif ( $this_level < $prev_level ) {
pop( @parents );
}

Found out that the last elsif won't work, 'cause there will be times when
I need to "jump" from level 4 to level 1.

Solved it by doing this instead:

elsif ( $this_level < $prev_level ) {
pop( @parents ) for ( $this_level .. ($prev_level - 1) );
}

Tata!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top