A "deep" hash to/from a file?

P

Pat

Hi,

This seems like something that would be a FAQ, but I couldn't find an
answer there.

I would like to build a hash from an input text file. The hash can
contain other hashes, and so on, to any depth. The format of the input
file can be anything that is convenient, e.g.:

---------------------------------
root
{
branch_1
{
sub_branch_1
{
leaf_1 = "This is leaf_1"
leaf_2 = "This is leaf_2"
}
}

branch_2
{
leaf_3 = "This is leaf_3"
}

leaf_4 = "This is leaf_4"
}
---------------------------------

I would like for that input file to generate a hash equivalent to what
this perl code would:

%tree{'root'}{'branch_1'}{'sub_branch_1'}{'leaf_1'} = "This is leaf_1";
%tree{'root'}{'branch_1'}{'sub_branch_1'}{'leaf_2'} = "This is leaf_2";
%tree{'root'}{'branch_2'}{'leaf_3'} = "This is leaf_3";
%tree{'root'}{'leaf_4'} = "This is leaf_4";

As a first attempt, I tried writing a recursive subroutine that would try
to match pairs of open/close curly braces, then call itself on everything
between the braces. But I couldn't get the right amount of "greediness"
in the matching.

Any help appreciated.

Pat
 
D

Damian Lukowski

As a first attempt, I tried writing a recursive subroutine that would try
to match pairs of open/close curly braces, then call itself on everything
between the braces. But I couldn't get the right amount of "greediness"
in the matching.


I think, a recursive descent parser will be the best and easiest way to
do it.
 
P

Pat

Damian said:
I think, a recursive descent parser will be the best and easiest way
to do it.

Yeah, I hoped to come up with something like that, but failed to do the job
with a regular expression. The problem is with the nested curly braces.
For example, for the following input string...

a { b { c } d { e } }

If you use non-greedy matching, you get this as your first match:

a { b { c }

.... and that is clearly incorrect. But if you use normal greedy matching,
you get this as your first match:

a { b { c } d { e } }

.... which is correct, so you recurse on the string between the outermost
braces, which is:

b { c } d { e }

and there, greedy matching fails, because it thinks that you want to parse
this:

c } d { e

.... because that is what is inside the outermost braces.
 
D

Damian Lukowski

Pat said:
Yeah, I hoped to come up with something like that, but failed to do the job
with a regular expression. The problem is with the nested curly braces.
For example, for the following input string...

Thats not, how a descent parser works. Simply speaking, if you want to
parse a branch with all subbranches and leafes, you have to check, if
the string begins with "BRANCHNAME {". If so, call a subroutine, which
parses a branch. This subroutine has to check for other branches and
leafes, and finally it has to chop off the closing curly branch.
This may look like as following:

sub parse_branch
{
my $stringref = shift;
my $name;
my @branches = ();
my @leafes = ();

if ($$stringref begins_with "BRANCHNAME {") # Parse "itself"
{
$name = BRANCHNAME;
cut_off_BRANCHNAME_and_curly($$stringref);
} else
{
die "Parse Error";
};

while ($$stringref begins_with(LVALUE_or_BRANCHNAME)
{
if ($$stringref begins_with "LVALUE =") # Parse Leaf.
{
push @leafes, parse_assignment($stringref);
}
elsif ($$stringref begins_with "BRANCHNAME {") # Parse subbranch
{
push @branches, parse_branch($stringref);
}
};

if ($string begins_with "}") # Chop closing brace.
{
cut_off_curly($$stringref);
return new Branch($name, \@branches, \@leafes);
} else
{
die "Parse Error";
}
}

sub parse_assignment { ... }
 
D

Damian Lukowski

Sorry, I mixed up return types and the shift. Forget the stringref shift
at the beginning and think of it as a global variable.
 
T

Tad J McClellan

Abigail said:
_
Pat ([email protected]) wrote on VCCLIX September MCMXCIII in
<URL::: Hi,
::
:: This seems like something that would be a FAQ, but I couldn't find an
:: answer there.
::
:: I would like to build a hash from an input text file. The hash can
:: contain other hashes, and so on, to any depth. The format of the input
:: file can be anything that is convenient, e.g.:

If it can be anything that is convenient, why not pick a format that
can already be parsed by a module? You might want to store it in the
format of a serializer (for instance YAML or Data::Dumper),


If the data is really as consistent as the example data, then
running it through these gets it nearly into Data::Dumper format:

s/=/=>/;
s/\{/=> {/;
s/}(\s+)/},$1/;
s/"(\s+)/",$1/;
 
J

Jürgen Exner

Pat said:
I would like to build a hash from an input text file. The hash can
contain other hashes, and so on, to any depth. The format of the input
file can be anything that is convenient, e.g.:

You may want to have a look a
Data::Dumper - stringified perl data structures, suitable for both
printing and "eval"

It is designed for specifically that purpose.

jue
 
T

Ted Zlatanov

DL> Thats not, how a descent parser works. Simply speaking, if you want to
DL> parse a branch with all subbranches and leafes, you have to check, if
DL> the string begins with "BRANCHNAME {". If so, call a subroutine, which
DL> parses a branch. This subroutine has to check for other branches and
DL> leafes, and finally it has to chop off the closing curly branch.
DL> This may look like as following:

You should consider Parse::RecDescent for a simple solution to
recursive-descent parsing in Perl 5. It's actually pretty similar to
what Perl 6 will do with grammars. Your code may work, but it's hard to
maintain it if the format changes even slightly.

Anyhow, I wouldn't use P::RD either. The OP should look into a true
structured format like YAML or XML or Config*, as others have suggested.
Even JSON might work; it's a pretty nice format actually. "Simple"
formats like the one the OP shows tend to evolve into ugliness over time
and become unmaintainable messes, so unless there's a very good reason
I'd stick with something more standard.

If a custom format is absolutely necessary, consider losing the braces
and the structure, going to a line-oriented format instead:

root branch_1 sub_branch_1 leaf_1 = This is leaf_1
root branch_1 sub_branch_1 leaf_2 = This is leaf_2
....

(assuming no spaces in node names)

This is much easier to parse, and it's understandable. It's verbose,
but that's a problem I'd be willing to live with. Parsing then becomes
just:

#!/usr/bin/perl

use warnings;
use strict;
use Data::Dumper;

my $config = {};
while (my $line = <DATA>)
{
chomp $line;
my ($path, $data) = split / = /, $line;
my @path = split ' ', $path;
my $where = $config;
while (scalar @path)
{
my $node = shift @path;
if (scalar @path) # do we have more branches?
{
$where->{$node} = {} unless exists $where->{$node};
$where = $where->{$node};
}
else
{
$where->{$node} = $data;
}
}

}

print Dumper $config;

__DATA__
root branch_1 sub_branch_1 leaf_1 = This is leaf_111
root branch_1 sub_branch_1 leaf_2 = This is leaf_112
root branch_1 sub_branch_2 leaf_1 = This is leaf_121
root branch_2 = This is branch 2
root branch_2 = This is branch 2 again

Ted
 
P

Pat

Jürgen Exner said:
You may want to have a look a
Data::Dumper - stringified perl data structures, suitable for both
printing and "eval"

It is designed for specifically that purpose.

Thanks for all the suggestions. Using Dumper and eval() turned out to be
pretty straightforward. Just a slight reformatting was needed to match the
Dumper syntax.

Pat
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top