convert structured strings to possibly deep hash of hashes

S

seven.reeds

Hi,

I have a list of well structured strings, actually they are file paths.
This just measn there are strings of '/' seperated sub-strings. I can
easily split these into an array. My question is really one of
building a HoH based on all of the string records. My goal is to take
strings like:

/file.txt
/a/file.txt
/a/b/c
/a/b/c/file.txt
/z/m/w/file.txt

and produce something like:

%dir_hash(
'file.txt' => '',
'a' => {
'file.txt' => '',
'b' => {
'c' => {
'file.txt'
}
}
},
'z' => {
'm' => {
'w' => {
'file.txt' => ''
}
}
}
)

My current feeble attempt was the following. I am getting lost in the
"\%"s and the "%{}"s. To pre-answer some obvious suggestions: yes, I
have read perlref and perlreftut and my perl book(s) but I am not
seeing how to recursibely build/populate a HOH in this situation. I
have looked through CPAN and did find the IO::Dir module which might
help if the data was not static and i was pulling it from the
originating directory instead of a flat file.

suggestions?



#!/usr/local/bin/perl -w

use strict;
select(STDOUT);
$|++;

use Data::Dumper;

my @path = ();
my %path_hash = ();

@path = split('/', "/a/b/c/d/e/f/g.ics");
shift(@path) if ($path[0] eq "");

&make_path_hash(\%path_hash, @path);

print STDOUT Dumper(%path_hash);


sub make_path_hash
{
my ($path_hash, $this, @list) = @_;

return if (! defined($this));

$this =~ s/^\s+//s;
$this =~ s/\s+$//s;

return if ($this eq "");

$path_hash->{$this} = 1 if (! defined($path_hash->{$this}));

&make_path_hash(\%{ $path_hash->{$this} }, @list);
}
 
A

anno4000

seven.reeds said:
Hi,

I have a list of well structured strings, actually they are file paths.
This just measn there are strings of '/' seperated sub-strings. I can
easily split these into an array. My question is really one of
building a HoH based on all of the string records. My goal is to take
strings like:

/file.txt
/a/file.txt
/a/b/c
/a/b/c/file.txt
/z/m/w/file.txt

and produce something like:

%dir_hash(
'file.txt' => '',
'a' => {
'file.txt' => '',
'b' => {
'c' => {
'file.txt'
}
}
},
'z' => {
'm' => {
'w' => {
'file.txt' => ''
}
}
}
)

Here is a recursive method:

my @files = qw(
/file.txt
/a/file.txt
/a/b/c
/a/b/c/file.txt
/z/m/w/file.txt
);

my %dir_hash;
for ( @files ) {
my ( @parts) = split m{/};
add_parts( \ %dir_hash, @parts);
}
print Dumper \ %dir_hash;
exit;

sub add_parts {
my ( $hash, $first, @parts) = @_;
$hash->{ $first} ||= {};
if ( @parts ) {
add_parts( $hash->{ $first}, @parts);
} else {
$hash->{ $first} = '';
}
}

It doesn't reproduce your structure exactly because all files are
given absolute path names. Therefore the root of the file system
is represented in the hash as a top node (which is an empty string).
Specify the files with relative path names (remove the leading "/"s)
to avoid that.

Anno
 
B

Brian McCauley

Michele said:
my $last=\%dirhash;
$last=$last->{$_} ||= {} for split qr|/|;

I generally prefer to avoid the extra level of empty hash wth a double
reference and autovivification.

my $last=\\%dirhash;
$last=\$$last->{$_} for split qr|/|;
 
U

Uri Guttman

MD> Cool! Hadn't thought of it and I was just thinking: I wish perl would
MD> autovivify this too...

that is how i would have done it too. the recursive version that anno
posted was too much work and this looping/autovivify concept works well.

i have similar code in my autoviv tutorial at:

http://sysarch.com/Perl/autoviv.txt

uri
 
A

anno4000

A. Sinan Unur said:
Looking at your deep_exists sub:

sub deep_exists {
my( $hash_ref, @keys ) = @_ ;
unless ( @keys ) {
warn "deep_exists: no keys" ;
return ;
}

foreach my $key ( @keys ) {
unless( ref $hash_ref eq 'HASH' ) {
warn "$hash_ref not a HASH ref" ;
return ;
}

return 0 unless exists( $hash_ref->{$key} ) ;
# ^^^^^^^^
# Shouldn't this be a plain return ?
#

Absolutely not!

Like its model exists(), deep_exists() is supposed to return a
boolean value. Returning empty is not a "better way to return
false" it is a way of indicating failure. Thus returning empty
as it does when no keys are given or if a ref is not a hash is
adequate, but not when non-existence is detected.

If you called deep_exists in a map like this:

my @res = map deep_exists( $_, @keys), @hashes;

you'd expect an array as long as @hashes filled with true and
false values, not an array of only true values, as many as were
returned.

Anno
 
U

Uri Guttman

ASU> (e-mail address removed)-berlin.de wrote in ASU> @news.dfncis.de:

ASU> ...

ASU> ...

and that could be fixed by calling scalar on deep_exists.

ASU> Ah-a! It makes a lot of sense with the usage example you give above.

ASU> I can think of some other usage scenarios where the other way
ASU> seems to make sense to me, but since I have never written code
ASU> that uses a deep_exists style test, I'll assume that the example
ASU> you give is the appropriate case to consider. Thank you for the
ASU> clarification.

the issue of what to return for boolean is split. i currently prefer
plain return as in stem it is needed in message delivery to denote NO
VALUE returned as those methods are called in list context. return undef
is a legit return and will cause a return message sent back whereas
plain return won't.

if you call something as in the map case above or in an arg list and do
a plain return you get an empty list which will be thrown away. so some
coders claim the sub should handle that and return a scalar value or a
real undef. i say the caller should force the scalar context if they
want a scalar value in a list context. the caller knows more about what
they want and they can control things with scalar. what if you did want
an empty list when assigning to an array and not a single entry with
undef? the caller cannot fix that without a special case checking for
just undef in the first slot. so i say my way of using a plain return is
more flexible and puts the onus on the caller to do the correct thing
with the result. plain return was designed to return undef in scalar and
() in list so it would be false in both scalar and list contexts. just
using scalar() will make that work in a list context and is clearer to
the reader that you are expecting a scalar (boolean) there.

now, i wrote that tutorial a long time ago and that code isn't something
i use. i would probably keep a literal false there if i used it since it
only can return a boolean but i am also leaning to a plain return due to
my reasoning above. in my production code i always use plain return (as
is required in stem).

but this was a good argument for now i have logical backing for my
answer. a caller can use scalar in a list context to handle a plain
return but there is no easy way to fix the problem of a literal undef
boolean returned to an array.

uri
 
A

anno4000

Uri Guttman said:
ASU> (e-mail address removed)-berlin.de wrote in ASU> @news.dfncis.de:


ASU> ...


ASU> ...


and that could be fixed by calling scalar on deep_exists.

ASU> Ah-a! It makes a lot of sense with the usage example you give above.

ASU> I can think of some other usage scenarios where the other way
ASU> seems to make sense to me, but since I have never written code
ASU> that uses a deep_exists style test, I'll assume that the example
ASU> you give is the appropriate case to consider. Thank you for the
ASU> clarification.

the issue of what to return for boolean is split. i currently prefer
plain return as in stem it is needed in message delivery to denote NO
VALUE returned as those methods are called in list context. return undef
is a legit return and will cause a return message sent back whereas
plain return won't.

I can see situations where that behavior would be useful, but I wouldn't
make it the standard. You wouldn't write

is_even {
my $n = shift;
return if $n % 2;
return 1;
}

would you?

Here is my take:

First, the question is clear for functions that return multiple values:
Returning empty is just the special case where the number of returned
values is 0. If an error situation must be distinguished from that
case, another solution must be sought.

So the question only applies to functions that normally return a single
scalar. In that case, returning empty should be reserved to signal
failure. Even if the correct value to return is undef, the function
should return that as a single value. The user can decide whether
the difference matters:

my $res = func( ...);

will set $res to undef in failure situations and when a legit undef
is returned. If the difference doesn't matter, that's all that's
needed. If the difference does matter, the simple device of

if ( my ( $res) = func( ...) ) {
# got a value, use it
} else {
# failure
}

brings it out.
if you call something as in the map case above or in an arg list and do
a plain return you get an empty list which will be thrown away. so some
coders claim the sub should handle that and return a scalar value or a
real undef. i say the caller should force the scalar context ...

Again, I can see that this might useful behavior for a certain set
of functions, but in general a function that is supposed to return
a scalar should only fail to return a scalar if there is no scalar
to return, not if the scalar happens to be undefined.

Anno
 
U

Uri Guttman

a> I can see situations where that behavior would be useful, but I wouldn't
a> make it the standard. You wouldn't write

a> is_even {
a> my $n = shift;
a> return if $n % 2;
a> return 1;
a> }

a> would you?

it all depends on what false value you want to return. perl has multiple
falses as we know and they are subtly different in these
situations. many/most perl builtins return ''/0 (sv has both integer and
string) for false. some return undef (not sure if they return actual
undef or equivilent of plain return).

a> First, the question is clear for functions that return multiple values:
a> Returning empty is just the special case where the number of returned
a> values is 0. If an error situation must be distinguished from that
a> case, another solution must be sought.

that is the question. is the error return in-band or out of band? is
undef a legit 'normal' value to return or is an empty list desired in
when passing back no data?

a> So the question only applies to functions that normally return a single
a> scalar. In that case, returning empty should be reserved to signal
a> failure. Even if the correct value to return is undef, the function
a> should return that as a single value. The user can decide whether
a> the difference matters:

a> my $res = func( ...);

but that can't tell if that was a plain return or return undef which is
my point.

a> will set $res to undef in failure situations and when a legit undef
a> is returned. If the difference doesn't matter, that's all that's
a> needed. If the difference does matter, the simple device of

a> if ( my ( $res) = func( ...) ) {
a> # got a value, use it
a> } else {
a> # failure
a> }

a> brings it out.

not sure if that works the same as @res = func() which is what i do. i
know for sure if i get a real undef vs an empty list that way. an array
with undef in the first slot is true. i know a plain return will in a
list context will be false so that probably works too.

a> Again, I can see that this might useful behavior for a certain set
a> of functions, but in general a function that is supposed to return
a> a scalar should only fail to return a scalar if there is no scalar
a> to return, not if the scalar happens to be undefined.

it also comes down to whether you want the called to check for a special
case of an empty list or you want a real scalar to fill in a slot. you
can work around either way but i think having the called use scalar() to
force a scalar value is better as it is clearer you want that in the
slot. your code (which i use in stem to check for a real undef or any
value vs plain return is slightly more complex and subtle. the reader of
that code has to notice the () and also know that it will be false is a
plain return is done. i do it since i must know that result and it is in
only one key place and the plain return requirement is well documented
and enforced. you can go either way and i have argued this before. i
just prefer plain returns. and as a side point, i also encourage (so
does PBP) explicit returns in all subs (except in single expression subs
and such).

it is all about api design and who does what with the return values.

in any case this is a good topic to debate as it is not something
covered in the docs or books and it hopefully will ejimicate some perl
hackers about this.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top