convert structured strings to possibly deep hash of hashes

Discussion in 'Perl Misc' started by seven.reeds, Aug 9, 2006.

  1. seven.reeds

    seven.reeds Guest

    Hi,

    I have a list of well structured strings, actually they are file paths.
    This just measn there are strings of '/' seperated sub-strings. I can
    easily split these into an array. My question is really one of
    building a HoH based on all of the string records. My goal is to take
    strings like:

    /file.txt
    /a/file.txt
    /a/b/c
    /a/b/c/file.txt
    /z/m/w/file.txt

    and produce something like:

    %dir_hash(
    'file.txt' => '',
    'a' => {
    'file.txt' => '',
    'b' => {
    'c' => {
    'file.txt'
    }
    }
    },
    'z' => {
    'm' => {
    'w' => {
    'file.txt' => ''
    }
    }
    }
    )

    My current feeble attempt was the following. I am getting lost in the
    "\%"s and the "%{}"s. To pre-answer some obvious suggestions: yes, I
    have read perlref and perlreftut and my perl book(s) but I am not
    seeing how to recursibely build/populate a HOH in this situation. I
    have looked through CPAN and did find the IO::Dir module which might
    help if the data was not static and i was pulling it from the
    originating directory instead of a flat file.

    suggestions?



    #!/usr/local/bin/perl -w

    use strict;
    select(STDOUT);
    $|++;

    use Data::Dumper;

    my @path = ();
    my %path_hash = ();

    @path = split('/', "/a/b/c/d/e/f/g.ics");
    shift(@path) if ($path[0] eq "");

    &make_path_hash(\%path_hash, @path);

    print STDOUT Dumper(%path_hash);


    sub make_path_hash
    {
    my ($path_hash, $this, @list) = @_;

    return if (! defined($this));

    $this =~ s/^\s+//s;
    $this =~ s/\s+$//s;

    return if ($this eq "");

    $path_hash->{$this} = 1 if (! defined($path_hash->{$this}));

    &make_path_hash(\%{ $path_hash->{$this} }, @list);
    }
    seven.reeds, Aug 9, 2006
    #1
    1. Advertising

  2. seven.reeds

    -berlin.de Guest

    seven.reeds <> wrote in comp.lang.perl.misc:
    > Hi,
    >
    > I have a list of well structured strings, actually they are file paths.
    > This just measn there are strings of '/' seperated sub-strings. I can
    > easily split these into an array. My question is really one of
    > building a HoH based on all of the string records. My goal is to take
    > strings like:
    >
    > /file.txt
    > /a/file.txt
    > /a/b/c
    > /a/b/c/file.txt
    > /z/m/w/file.txt
    >
    > and produce something like:
    >
    > %dir_hash(
    > 'file.txt' => '',
    > 'a' => {
    > 'file.txt' => '',
    > 'b' => {
    > 'c' => {
    > 'file.txt'
    > }
    > }
    > },
    > 'z' => {
    > 'm' => {
    > 'w' => {
    > 'file.txt' => ''
    > }
    > }
    > }
    > )


    Here is a recursive method:

    my @files = qw(
    /file.txt
    /a/file.txt
    /a/b/c
    /a/b/c/file.txt
    /z/m/w/file.txt
    );

    my %dir_hash;
    for ( @files ) {
    my ( @parts) = split m{/};
    add_parts( \ %dir_hash, @parts);
    }
    print Dumper \ %dir_hash;
    exit;

    sub add_parts {
    my ( $hash, $first, @parts) = @_;
    $hash->{ $first} ||= {};
    if ( @parts ) {
    add_parts( $hash->{ $first}, @parts);
    } else {
    $hash->{ $first} = '';
    }
    }

    It doesn't reproduce your structure exactly because all files are
    given absolute path names. Therefore the root of the file system
    is represented in the hash as a top node (which is an empty string).
    Specify the files with relative path names (remove the leading "/"s)
    to avoid that.

    Anno
    -berlin.de, Aug 9, 2006
    #2
    1. Advertising

  3. Michele Dondi wrote:
    > my $last=\%dirhash;
    > $last=$last->{$_} ||= {} for split qr|/|;


    I generally prefer to avoid the extra level of empty hash wth a double
    reference and autovivification.

    my $last=\\%dirhash;
    $last=\$$last->{$_} for split qr|/|;
    Brian McCauley, Aug 9, 2006
    #3
  4. seven.reeds

    seven.reeds Guest

    very cool indeed. Thank you all
    seven.reeds, Aug 9, 2006
    #4
  5. seven.reeds

    Uri Guttman Guest

    >>>>> "MD" == Michele Dondi <> writes:

    MD> On 9 Aug 2006 05:31:53 -0700, "Brian McCauley" <>
    MD> wrote:

    >>> my $last=\%dirhash;
    >>> $last=$last->{$_} ||= {} for split qr|/|;

    >>
    >> I generally prefer to avoid the extra level of empty hash wth a double
    >> reference and autovivification.
    >>
    >> my $last=\\%dirhash;
    >> $last=\$$last->{$_} for split qr|/|;


    MD> Cool! Hadn't thought of it and I was just thinking: I wish perl would
    MD> autovivify this too...

    that is how i would have done it too. the recursive version that anno
    posted was too much work and this looping/autovivify concept works well.

    i have similar code in my autoviv tutorial at:

    http://sysarch.com/Perl/autoviv.txt

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Aug 9, 2006
    #5
  6. seven.reeds

    -berlin.de Guest

    A. Sinan Unur <> wrote in comp.lang.perl.misc:
    > Uri Guttman <> wrote in
    > news::
    >
    > > i have similar code in my autoviv tutorial at:
    > >
    > > http://sysarch.com/Perl/autoviv.txt

    >
    > Looking at your deep_exists sub:
    >
    > sub deep_exists {
    > my( $hash_ref, @keys ) = @_ ;
    > unless ( @keys ) {
    > warn "deep_exists: no keys" ;
    > return ;
    > }
    >
    > foreach my $key ( @keys ) {
    > unless( ref $hash_ref eq 'HASH' ) {
    > warn "$hash_ref not a HASH ref" ;
    > return ;
    > }
    >
    > return 0 unless exists( $hash_ref->{$key} ) ;
    > # ^^^^^^^^
    > # Shouldn't this be a plain return ?
    > #


    Absolutely not!

    Like its model exists(), deep_exists() is supposed to return a
    boolean value. Returning empty is not a "better way to return
    false" it is a way of indicating failure. Thus returning empty
    as it does when no keys are given or if a ref is not a hash is
    adequate, but not when non-existence is detected.

    If you called deep_exists in a map like this:

    my @res = map deep_exists( $_, @keys), @hashes;

    you'd expect an array as long as @hashes filled with true and
    false values, not an array of only true values, as many as were
    returned.

    Anno
    -berlin.de, Aug 9, 2006
    #6
  7. seven.reeds

    DJ Stunks Guest

    Michele Dondi wrote:
    > On 9 Aug 2006 05:31:53 -0700, "Brian McCauley" <>
    > wrote:
    >
    > >> my $last=\%dirhash;
    > >> $last=$last->{$_} ||= {} for split qr|/|;

    > >
    > >I generally prefer to avoid the extra level of empty hash wth a double
    > >reference and autovivification.
    > >
    > >my $last=\\%dirhash;
    > >$last=\$$last->{$_} for split qr|/|;

    >
    > Cool! Hadn't thought of it and I was just thinking: I wish perl would
    > autovivify this too...


    you two are too clever... :)

    -jp
    DJ Stunks, Aug 9, 2006
    #7
  8. seven.reeds

    Uri Guttman Guest

    >>>>> "ASU" == A Sinan Unur <> writes:

    ASU> -berlin.de wrote in news:4jum1bF9svqfU1
    ASU> @news.dfncis.de:

    >> A. Sinan Unur <> wrote in comp.lang.perl.misc:
    >>> Uri Guttman <> wrote in
    >>> news::
    >>>
    >>> > i have similar code in my autoviv tutorial at:
    >>> >
    >>> > http://sysarch.com/Perl/autoviv.txt
    >>>
    >>> Looking at your deep_exists sub:
    >>>
    >>> sub deep_exists {


    ASU> ...

    >>>
    >>> return 0 unless exists( $hash_ref->{$key} ) ;
    >>> # ^^^^^^^^
    >>> # Shouldn't this be a plain return ?
    >>> #

    >>
    >> Absolutely not!


    ASU> ...

    >> If you called deep_exists in a map like this:
    >>
    >> my @res = map deep_exists( $_, @keys), @hashes;
    >>
    >> you'd expect an array as long as @hashes filled with true and
    >> false values, not an array of only true values, as many as were
    >> returned.


    and that could be fixed by calling scalar on deep_exists.

    ASU> Ah-a! It makes a lot of sense with the usage example you give above.

    ASU> I can think of some other usage scenarios where the other way
    ASU> seems to make sense to me, but since I have never written code
    ASU> that uses a deep_exists style test, I'll assume that the example
    ASU> you give is the appropriate case to consider. Thank you for the
    ASU> clarification.

    the issue of what to return for boolean is split. i currently prefer
    plain return as in stem it is needed in message delivery to denote NO
    VALUE returned as those methods are called in list context. return undef
    is a legit return and will cause a return message sent back whereas
    plain return won't.

    if you call something as in the map case above or in an arg list and do
    a plain return you get an empty list which will be thrown away. so some
    coders claim the sub should handle that and return a scalar value or a
    real undef. i say the caller should force the scalar context if they
    want a scalar value in a list context. the caller knows more about what
    they want and they can control things with scalar. what if you did want
    an empty list when assigning to an array and not a single entry with
    undef? the caller cannot fix that without a special case checking for
    just undef in the first slot. so i say my way of using a plain return is
    more flexible and puts the onus on the caller to do the correct thing
    with the result. plain return was designed to return undef in scalar and
    () in list so it would be false in both scalar and list contexts. just
    using scalar() will make that work in a list context and is clearer to
    the reader that you are expecting a scalar (boolean) there.

    now, i wrote that tutorial a long time ago and that code isn't something
    i use. i would probably keep a literal false there if i used it since it
    only can return a boolean but i am also leaning to a plain return due to
    my reasoning above. in my production code i always use plain return (as
    is required in stem).

    but this was a good argument for now i have logical backing for my
    answer. a caller can use scalar in a list context to handle a plain
    return but there is no easy way to fix the problem of a literal undef
    boolean returned to an array.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Aug 9, 2006
    #8
  9. seven.reeds

    -berlin.de Guest

    Uri Guttman <> wrote in comp.lang.perl.misc:
    > >>>>> "ASU" == A Sinan Unur <> writes:

    >
    > ASU> -berlin.de wrote in news:4jum1bF9svqfU1
    > ASU> @news.dfncis.de:
    >
    > >> A. Sinan Unur <> wrote in comp.lang.perl.misc:
    > >>> Uri Guttman <> wrote in
    > >>> news::
    > >>>
    > >>> > i have similar code in my autoviv tutorial at:
    > >>> >
    > >>> > http://sysarch.com/Perl/autoviv.txt
    > >>>
    > >>> Looking at your deep_exists sub:
    > >>>
    > >>> sub deep_exists {

    >
    > ASU> ...
    >
    > >>>
    > >>> return 0 unless exists( $hash_ref->{$key} ) ;
    > >>> # ^^^^^^^^
    > >>> # Shouldn't this be a plain return ?
    > >>> #
    > >>
    > >> Absolutely not!

    >
    > ASU> ...
    >
    > >> If you called deep_exists in a map like this:
    > >>
    > >> my @res = map deep_exists( $_, @keys), @hashes;
    > >>
    > >> you'd expect an array as long as @hashes filled with true and
    > >> false values, not an array of only true values, as many as were
    > >> returned.

    >
    > and that could be fixed by calling scalar on deep_exists.
    >
    > ASU> Ah-a! It makes a lot of sense with the usage example you give above.
    >
    > ASU> I can think of some other usage scenarios where the other way
    > ASU> seems to make sense to me, but since I have never written code
    > ASU> that uses a deep_exists style test, I'll assume that the example
    > ASU> you give is the appropriate case to consider. Thank you for the
    > ASU> clarification.
    >
    > the issue of what to return for boolean is split. i currently prefer
    > plain return as in stem it is needed in message delivery to denote NO
    > VALUE returned as those methods are called in list context. return undef
    > is a legit return and will cause a return message sent back whereas
    > plain return won't.


    I can see situations where that behavior would be useful, but I wouldn't
    make it the standard. You wouldn't write

    is_even {
    my $n = shift;
    return if $n % 2;
    return 1;
    }

    would you?

    Here is my take:

    First, the question is clear for functions that return multiple values:
    Returning empty is just the special case where the number of returned
    values is 0. If an error situation must be distinguished from that
    case, another solution must be sought.

    So the question only applies to functions that normally return a single
    scalar. In that case, returning empty should be reserved to signal
    failure. Even if the correct value to return is undef, the function
    should return that as a single value. The user can decide whether
    the difference matters:

    my $res = func( ...);

    will set $res to undef in failure situations and when a legit undef
    is returned. If the difference doesn't matter, that's all that's
    needed. If the difference does matter, the simple device of

    if ( my ( $res) = func( ...) ) {
    # got a value, use it
    } else {
    # failure
    }

    brings it out.

    > if you call something as in the map case above or in an arg list and do
    > a plain return you get an empty list which will be thrown away. so some
    > coders claim the sub should handle that and return a scalar value or a
    > real undef. i say the caller should force the scalar context ...


    Again, I can see that this might useful behavior for a certain set
    of functions, but in general a function that is supposed to return
    a scalar should only fail to return a scalar if there is no scalar
    to return, not if the scalar happens to be undefined.

    Anno
    -berlin.de, Aug 10, 2006
    #9
  10. seven.reeds

    Uri Guttman Guest

    >>>>> "a" == anno4000 <-berlin.de> writes:

    a> Uri Guttman <> wrote in comp.lang.perl.misc:

    >> the issue of what to return for boolean is split. i currently prefer
    >> plain return as in stem it is needed in message delivery to denote NO
    >> VALUE returned as those methods are called in list context. return undef
    >> is a legit return and will cause a return message sent back whereas
    >> plain return won't.


    a> I can see situations where that behavior would be useful, but I wouldn't
    a> make it the standard. You wouldn't write

    a> is_even {
    a> my $n = shift;
    a> return if $n % 2;
    a> return 1;
    a> }

    a> would you?

    it all depends on what false value you want to return. perl has multiple
    falses as we know and they are subtly different in these
    situations. many/most perl builtins return ''/0 (sv has both integer and
    string) for false. some return undef (not sure if they return actual
    undef or equivilent of plain return).

    a> First, the question is clear for functions that return multiple values:
    a> Returning empty is just the special case where the number of returned
    a> values is 0. If an error situation must be distinguished from that
    a> case, another solution must be sought.

    that is the question. is the error return in-band or out of band? is
    undef a legit 'normal' value to return or is an empty list desired in
    when passing back no data?

    a> So the question only applies to functions that normally return a single
    a> scalar. In that case, returning empty should be reserved to signal
    a> failure. Even if the correct value to return is undef, the function
    a> should return that as a single value. The user can decide whether
    a> the difference matters:

    a> my $res = func( ...);

    but that can't tell if that was a plain return or return undef which is
    my point.

    a> will set $res to undef in failure situations and when a legit undef
    a> is returned. If the difference doesn't matter, that's all that's
    a> needed. If the difference does matter, the simple device of

    a> if ( my ( $res) = func( ...) ) {
    a> # got a value, use it
    a> } else {
    a> # failure
    a> }

    a> brings it out.

    not sure if that works the same as @res = func() which is what i do. i
    know for sure if i get a real undef vs an empty list that way. an array
    with undef in the first slot is true. i know a plain return will in a
    list context will be false so that probably works too.

    >> if you call something as in the map case above or in an arg list and do
    >> a plain return you get an empty list which will be thrown away. so some
    >> coders claim the sub should handle that and return a scalar value or a
    >> real undef. i say the caller should force the scalar context ...


    a> Again, I can see that this might useful behavior for a certain set
    a> of functions, but in general a function that is supposed to return
    a> a scalar should only fail to return a scalar if there is no scalar
    a> to return, not if the scalar happens to be undefined.

    it also comes down to whether you want the called to check for a special
    case of an empty list or you want a real scalar to fill in a slot. you
    can work around either way but i think having the called use scalar() to
    force a scalar value is better as it is clearer you want that in the
    slot. your code (which i use in stem to check for a real undef or any
    value vs plain return is slightly more complex and subtle. the reader of
    that code has to notice the () and also know that it will be false is a
    plain return is done. i do it since i must know that result and it is in
    only one key place and the plain return requirement is well documented
    and enforced. you can go either way and i have argued this before. i
    just prefer plain returns. and as a side point, i also encourage (so
    does PBP) explicit returns in all subs (except in single expression subs
    and such).

    it is all about api design and who does what with the return values.

    in any case this is a good topic to debate as it is not something
    covered in the docs or books and it hopefully will ejimicate some perl
    hackers about this.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Aug 10, 2006
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. rp
    Replies:
    1
    Views:
    499
    red floyd
    Nov 10, 2011
  2. Scott  Gilpin
    Replies:
    2
    Views:
    211
  3. Perl Learner

    Hashes of hashes or just one hash ?

    Perl Learner, Jun 8, 2005, in forum: Perl Misc
    Replies:
    11
    Views:
    207
  4. Tim O'Donovan

    Hash of hashes, of hashes, of arrays of hashes

    Tim O'Donovan, Oct 27, 2005, in forum: Perl Misc
    Replies:
    5
    Views:
    202
  5. Replies:
    3
    Views:
    199
Loading...

Share This Page