A "deep" hash to/from a file?

Pat · Jan 24, 2008

Hi,

This seems like something that would be a FAQ, but I couldn't find an
answer there.

I would like to build a hash from an input text file. The hash can
contain other hashes, and so on, to any depth. The format of the input
file can be anything that is convenient, e.g.:

---------------------------------
root
{
branch_1
{
sub_branch_1
{
leaf_1 = "This is leaf_1"
leaf_2 = "This is leaf_2"
}
}

branch_2
{
leaf_3 = "This is leaf_3"
}

leaf_4 = "This is leaf_4"
}
---------------------------------

I would like for that input file to generate a hash equivalent to what
this perl code would:

%tree{'root'}{'branch_1'}{'sub_branch_1'}{'leaf_1'} = "This is leaf_1";
%tree{'root'}{'branch_1'}{'sub_branch_1'}{'leaf_2'} = "This is leaf_2";
%tree{'root'}{'branch_2'}{'leaf_3'} = "This is leaf_3";
%tree{'root'}{'leaf_4'} = "This is leaf_4";

As a first attempt, I tried writing a recursive subroutine that would try
to match pairs of open/close curly braces, then call itself on everything
between the braces. But I couldn't get the right amount of "greediness"
in the matching.

Any help appreciated.

Pat

Damian Lukowski · Jan 24, 2008

As a first attempt, I tried writing a recursive subroutine that would try

to match pairs of open/close curly braces, then call itself on everything
between the braces. But I couldn't get the right amount of "greediness"
in the matching.

I think, a recursive descent parser will be the best and easiest way to
do it.

Pat · Jan 24, 2008

Damian said:
I think, a recursive descent parser will be the best and easiest way
to do it.

Yeah, I hoped to come up with something like that, but failed to do the job
with a regular expression. The problem is with the nested curly braces.
For example, for the following input string...

a { b { c } d { e } }

If you use non-greedy matching, you get this as your first match:

a { b { c }

.... and that is clearly incorrect. But if you use normal greedy matching,
you get this as your first match:

a { b { c } d { e } }

.... which is correct, so you recurse on the string between the outermost
braces, which is:

b { c } d { e }

and there, greedy matching fails, because it thinks that you want to parse
this:

c } d { e

.... because that is what is inside the outermost braces.

Damian Lukowski · Jan 24, 2008

Pat said:
Yeah, I hoped to come up with something like that, but failed to do the job
with a regular expression. The problem is with the nested curly braces.
For example, for the following input string...

Thats not, how a descent parser works. Simply speaking, if you want to
parse a branch with all subbranches and leafes, you have to check, if
the string begins with "BRANCHNAME {". If so, call a subroutine, which
parses a branch. This subroutine has to check for other branches and
leafes, and finally it has to chop off the closing curly branch.
This may look like as following:

sub parse_branch
{
my $stringref = shift;
my $name;
my @branches = ();
my @leafes = ();

if ($$stringref begins_with "BRANCHNAME {") # Parse "itself"
{
$name = BRANCHNAME;
cut_off_BRANCHNAME_and_curly($$stringref);
} else
{
die "Parse Error";
};

while ($$stringref begins_with(LVALUE_or_BRANCHNAME)
{
if ($$stringref begins_with "LVALUE =") # Parse Leaf.
{
push @leafes, parse_assignment($stringref);
}
elsif ($$stringref begins_with "BRANCHNAME {") # Parse subbranch
{
push @branches, parse_branch($stringref);
}
};

if ($string begins_with "}") # Chop closing brace.
{
cut_off_curly($$stringref);
return new Branch($name, \@branches, \@leafes);
} else
{
die "Parse Error";
}
}

sub parse_assignment { ... }

Damian Lukowski · Jan 24, 2008

Sorry, I mixed up return types and the shift. Forget the stringref shift
at the beginning and think of it as a global variable.

Tad J McClellan · Jan 24, 2008

Abigail said:
_
Pat ([email protected]) wrote on VCCLIX September MCMXCIII in
<URL::: Hi,
::
:: This seems like something that would be a FAQ, but I couldn't find an
:: answer there.
::
:: I would like to build a hash from an input text file. The hash can
:: contain other hashes, and so on, to any depth. The format of the input
:: file can be anything that is convenient, e.g.:

If it can be anything that is convenient, why not pick a format that
can already be parsed by a module? You might want to store it in the
format of a serializer (for instance YAML or Data:umper),

If the data is really as consistent as the example data, then
running it through these gets it nearly into Data:

umper format:

s/=/=>/;
s/\{/=> {/;
s/}(\s+)/},$1/;
s/"(\s+)/",$1/;

Jürgen Exner · Jan 24, 2008

Pat said:
I would like to build a hash from an input text file. The hash can
contain other hashes, and so on, to any depth. The format of the input
file can be anything that is convenient, e.g.:

You may want to have a look a
Data:

umper - stringified perl data structures, suitable for both
printing and "eval"

It is designed for specifically that purpose.

jue

Ted Zlatanov · Jan 24, 2008

DL> Thats not, how a descent parser works. Simply speaking, if you want to
DL> parse a branch with all subbranches and leafes, you have to check, if
DL> the string begins with "BRANCHNAME {". If so, call a subroutine, which
DL> parses a branch. This subroutine has to check for other branches and
DL> leafes, and finally it has to chop off the closing curly branch.
DL> This may look like as following:

You should consider Parse::RecDescent for a simple solution to
recursive-descent parsing in Perl 5. It's actually pretty similar to
what Perl 6 will do with grammars. Your code may work, but it's hard to
maintain it if the format changes even slightly.

Anyhow, I wouldn't use P::RD either. The OP should look into a true
structured format like YAML or XML or Config*, as others have suggested.
Even JSON might work; it's a pretty nice format actually. "Simple"
formats like the one the OP shows tend to evolve into ugliness over time
and become unmaintainable messes, so unless there's a very good reason
I'd stick with something more standard.

If a custom format is absolutely necessary, consider losing the braces
and the structure, going to a line-oriented format instead:

root branch_1 sub_branch_1 leaf_1 = This is leaf_1
root branch_1 sub_branch_1 leaf_2 = This is leaf_2
....

(assuming no spaces in node names)

This is much easier to parse, and it's understandable. It's verbose,
but that's a problem I'd be willing to live with. Parsing then becomes
just:

#!/usr/bin/perl

use warnings;
use strict;
use Data:

umper;

my $config = {};
while (my $line = <DATA>)
{
chomp $line;
my ($path, $data) = split / = /, $line;
my @path = split ' ', $path;
my $where = $config;
while (scalar @path)
{
my $node = shift @path;
if (scalar @path) # do we have more branches?
{
$where->{$node} = {} unless exists $where->{$node};
$where = $where->{$node};
}
else
{
$where->{$node} = $data;
}
}

}

print Dumper $config;

__DATA__
root branch_1 sub_branch_1 leaf_1 = This is leaf_111
root branch_1 sub_branch_1 leaf_2 = This is leaf_112
root branch_1 sub_branch_2 leaf_1 = This is leaf_121
root branch_2 = This is branch 2
root branch_2 = This is branch 2 again

Ted

Pat · Jan 25, 2008

Jürgen Exner said:
You may want to have a look a
Data:umper - stringified perl data structures, suitable for both
printing and "eval"

It is designed for specifically that purpose.

Thanks for all the suggestions. Using Dumper and eval() turned out to be
pretty straightforward. Just a slight reformatting was needed to match the
Dumper syntax.

Pat

Unable to read input from keyboard, in below C code, for a BST.	0	Jul 20, 2025
I made a blockchain and want to make a cryptocurrency, but my code doesn't verify hash of each block	2	Jun 2, 2024
How do I move a file from one folder to another on a server?	3	Aug 24, 2025
Can I convert a PST file to EML if Outlook is not installed on my system?	0	Apr 18, 2026
How to transform a .pst file into a .eml file?	4	Jan 15, 2025
TIEHANDLE and deep recursion	11	May 9, 2012
How do I copy a image, and video from one location to another, and rename the file?	0	Oct 20, 2025
How to effectively develop a web application from scratch?	0	Jul 2, 2023

A "deep" hash to/from a file?

Pat

Damian Lukowski

Pat

Damian Lukowski

Damian Lukowski

Tad J McClellan

Jürgen Exner

Ted Zlatanov

Pat

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads