Converting indented data to a tree

Discussion in 'Perl Misc' started by Tore Aursand, Oct 15, 2003.

  1. Tore Aursand

    Tore Aursand Guest

    Hi!

    I need to convert a large text file with indented data into a tree (ie.
    into a child-parent relationship).

    The text file looks like this:

    Page 1
    Page 1.1
    Page 1.1.1
    Page 1.1.2
    Page 1.2
    Page 1.3
    Page 2
    Page 2.1

    I know that for each "level", there are a known number of spaces (in this
    case 4), or 0 spaces if we're on a root level.

    For each line I want to assign an incremental counter, so that I should
    end up with an array of arrays representing the tree. For the example
    above the array would look like this:

    @array = (
    [1,0], # Page 1
    [2,1], # Page 1.1
    [3,2], # Page 1.1.1
    [4,2], # Page 1.1.2
    [5,1], # Page 1.2
    [6,1], # Page 1.3
    [7,0], # Page 2
    [8,7], # Page 2.1
    );

    I guess you'll get the idea. I'm totally stuck on this one, and I would
    like some help from you guys (and girls).

    Thanks in advance!


    --
    Tore Aursand <>
    Tore Aursand, Oct 15, 2003
    #1
    1. Advertising

  2. Tore Aursand

    Anno Siegel Guest

    Tore Aursand <> wrote in comp.lang.perl.misc:
    > Hi!
    >
    > I need to convert a large text file with indented data into a tree (ie.
    > into a child-parent relationship).
    >
    > The text file looks like this:
    >
    > Page 1
    > Page 1.1
    > Page 1.1.1
    > Page 1.1.2
    > Page 1.2
    > Page 1.3
    > Page 2
    > Page 2.1
    >
    > I know that for each "level", there are a known number of spaces (in this
    > case 4), or 0 spaces if we're on a root level.
    >
    > For each line I want to assign an incremental counter, so that I should
    > end up with an array of arrays representing the tree. For the example
    > above the array would look like this:
    >
    > @array = (
    > [1,0], # Page 1
    > [2,1], # Page 1.1
    > [3,2], # Page 1.1.1
    > [4,2], # Page 1.1.2
    > [5,1], # Page 1.2
    > [6,1], # Page 1.3
    > [7,0], # Page 2
    > [8,7], # Page 2.1
    > );
    >
    > I guess you'll get the idea.


    I thought I did, until I sw the last item. Should "[8,7]" be "[8,1]"?
    Also, the first component seems to be nothing but a counter, one more
    than the index. What is it for?

    > I'm totally stuck on this one, and I would
    > like some help from you guys (and girls).


    The array is almost trivial to build:

    my @array;
    my $count = 1;
    while ( <DATA> ) {
    my ( $leading) = /^( *)/;
    push @array, [ $count++, length( $leading)/4];
    }

    I don't see how the array represents the tree. It only holds the
    level of each node, but not its relation to other nodes.

    Anno
    Anno Siegel, Oct 15, 2003
    #2
    1. Advertising

  3. Tore Aursand

    Tore Aursand Guest

    On Wed, 15 Oct 2003 07:45:19 +0000, Anno Siegel wrote:
    >> @array = (
    >> [1,0], # Page 1
    >> [2,1], # Page 1.1
    >> [3,2], # Page 1.1.1
    >> [4,2], # Page 1.1.2
    >> [5,1], # Page 1.2
    >> [6,1], # Page 1.3
    >> [7,0], # Page 2
    >> [8,7], # Page 2.1
    >> );


    > I thought I did, until I sw the last item. Should "[8,7]" be "[8,1]"?
    > Also, the first component seems to be nothing but a counter, one more
    > than the index. What is it for?


    *trying to figure out what's Anno is thinking* Ah! No, no. Guess I
    should have explained it a little better.

    The "innermost" array is a [id, parent_id] thing. While the 'id' can be a
    simple counter (incremental), I need to keep track of the relationship to
    the parent also.

    While writing this message I got the idea of push'ing and pop'ing the
    current parent '$count' onto/from an array, and it _seems to_ work;

    my @array = ();
    my $this_level = 0;
    my $prev_level = 0;
    my @parents = (0);
    my $count = 0;

    while ( <DATA> ) {
    chomp;
    next unless ( length );
    $count++;

    my ( $spaces ) = m,^( *),;
    $this_level = length( $spaces ) / 4;

    if ( $this_level > $prev_level ) {
    push( @parents, $count );
    }
    elsif ( $this_level < $prev_level ) {
    pop( @parents );
    }

    push( @array, [$count, $parents[-1]] );
    $prev_level = $this_level;
    }

    Anyone for a better solution?


    --
    Tore Aursand <>
    Tore Aursand, Oct 15, 2003
    #3
  4. Tore Aursand

    Steven Kuo Guest

    On Wed, 15 Oct 2003, Tore Aursand wrote:

    > On Wed, 15 Oct 2003 07:45:19 +0000, Anno Siegel wrote:
    > >> @array = (
    > >> [1,0], # Page 1
    > >> [2,1], # Page 1.1
    > >> [3,2], # Page 1.1.1
    > >> [4,2], # Page 1.1.2
    > >> [5,1], # Page 1.2
    > >> [6,1], # Page 1.3
    > >> [7,0], # Page 2
    > >> [8,7], # Page 2.1
    > >> );

    >
    > > I thought I did, until I sw the last item. Should "[8,7]" be "[8,1]"?
    > > Also, the first component seems to be nothing but a counter, one more
    > > than the index. What is it for?

    >
    > *trying to figure out what's Anno is thinking* Ah! No, no. Guess I
    > should have explained it a little better.
    >
    > The "innermost" array is a [id, parent_id] thing. While the 'id' can be a
    > simple counter (incremental), I need to keep track of the relationship to
    > the parent also.


    (snipped)



    Perhaps something like:

    use Data::Dumper;

    my %parent;
    my @array;

    while (<DATA>) {
    chomp;
    my $leading_spaces = 0;
    ++$leading_spaces while (/\G /g);
    $leading_spaces /= 4;
    if ($leading_spaces) {
    push @array, [ $., $parent{$leading_spaces-1}];
    } else {
    push @array, [$., 0];
    }

    $parent{$leading_spaces} = $.;

    }

    print Dumper(\@array);

    __DATA__
    Page 1
    Page 1.1
    Page 1.1.1
    Page 1.1.2
    Page 1.2
    Page 1.3
    Page 2
    Page 2.1



    --
    Hope this helps,
    Steven
    Steven Kuo, Oct 15, 2003
    #4
  5. Tore Aursand

    Jay Tilton Guest

    Tore Aursand <> wrote:

    : The "innermost" array is a [id, parent_id] thing. While the 'id' can be a
    : simple counter (incremental), I need to keep track of the relationship to
    : the parent also.
    :
    : While writing this message I got the idea of push'ing and pop'ing the
    : current parent '$count' onto/from an array, and it _seems to_ work;
    :
    : my @array = ();
    : my $this_level = 0;
    : my $prev_level = 0;
    : my @parents = (0);
    : my $count = 0;
    :
    : while ( <DATA> ) {
    : chomp;
    : next unless ( length );
    : $count++;
    :
    : my ( $spaces ) = m,^( *),;
    : $this_level = length( $spaces ) / 4;
    :
    : if ( $this_level > $prev_level ) {
    : push( @parents, $count );
    ^^^^^^
    Not really what you're after. The resulting data structure will end up
    pointing to the parent node's first child instead of pointing to the
    parent node itself.

    push( @parents, $count-1 );

    : }
    : elsif ( $this_level < $prev_level ) {
    : pop( @parents );

    It's possible for $this_level to drop more than one from $prev_level.
    If your data went like this:

    Page 1
    Page 1.1
    Page 1.1.1
    Page 1.1.2
    Page 2

    then the tree outline would be thrown out of alignment.
    splice() will work better than pop() :

    splice @parents, $this_level - $prev_level;

    : }
    :
    : push( @array, [$count, $parents[-1]] );
    : $prev_level = $this_level;
    : }
    :
    : Anyone for a better solution?

    Use the level to index the @parents array directly instead of
    adding/removing elements at its end.

    my @parents = 0;
    my @array;
    while(<DATA>) {
    my $depth = length( (/^( *)/)[0] ) / 4;
    $parents[$depth+1] = $. ;
    push @array, [ $. , $parents[$depth] ];
    }
    Jay Tilton, Oct 15, 2003
    #5
  6. Tore Aursand

    Tore Aursand Guest

    On Wed, 15 Oct 2003 20:32:26 +0200, Tore Aursand wrote:
    > if ( $this_level > $prev_level ) {
    > push( @parents, $count );
    > }
    > elsif ( $this_level < $prev_level ) {
    > pop( @parents );
    > }


    Found out that the last elsif won't work, 'cause there will be times when
    I need to "jump" from level 4 to level 1.

    Solved it by doing this instead:

    elsif ( $this_level < $prev_level ) {
    pop( @parents ) for ( $this_level .. ($prev_level - 1) );
    }

    Tata!


    --
    Tore Aursand <>
    Tore Aursand, Oct 16, 2003
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steve Maring

    making indented XML

    Steve Maring, Jan 10, 2004, in forum: Java
    Replies:
    0
    Views:
    590
    Steve Maring
    Jan 10, 2004
  2. Stub

    B tree, B+ tree and B* tree

    Stub, Nov 12, 2003, in forum: C Programming
    Replies:
    3
    Views:
    10,091
  3. sharan
    Replies:
    4
    Views:
    674
    CBFalconer
    Oct 30, 2007
  4. sharan
    Replies:
    2
    Views:
    821
    SM Ryan
    Oct 31, 2007
  5. sharan
    Replies:
    1
    Views:
    677
    CBFalconer
    Oct 30, 2007
Loading...

Share This Page