Hashtable of arrays

A

Apokrif

I've got a lexicon file which contains lines such as "chaise,
fauteuil=seat, chair" (one or several [more or less] synonymous French
words on the left, and their translation on the right). I'm trying to
build a hashtable that uses the French word as a key, and returns a
list containing English translations:

chaise=>(chair,seat)
fauteuil=>(chair,seat)

I wrote::

%translations=();
while (<FILE>){
chomp;
($left, $right)=split(/=/,$_);
@words_on_the_left=split (/, /, $left);
@words_on_the_right=split (/, /, $right);
for $word (@words_on_the_left){
if (!defined($translations{$word}))
{
$translations{$word}=@words_on_the_right;
}else{
$translations{$word}=($translations{$word}, @words_on_the_right);


}
print $translations{$word};
}

}


This doesn't work: instead of displaying English translations, the
script prints a list of numbers (which lets me think that the lists
are interpreted in a scalar context). I tried to adapt examples I
found on the Web and I replaced in several places "$" with "@" or with
"@{$", and I also tried to replace
"$translations{$word}=($translations{$word}, @words_on_the_right);"
with "$translations{$word}=($translations{$word},
\@words_on_the_right);"; sometimes the program prints what looks like
pointers, but I don't get the results I expected.
 
N

nobull

mike said:
0. You should declare your variables with my and use strict.
1. Hash values cannot be arrays. They may be references to arrays.

Good advice.
2. Use exists() to tell whether a hash key exists, not defined().

True, but it if (as in this case) you know that it won't exist without
being not only defined but also true (i.e. not '0' or '') then it is
more ideomatic to use neither.
3. I don't think you want to add a definition to an already existing
key if that definition already exists.

In other words you thing the OP wants a set. In that case as valueless
hash is the more natural representation to work with. Using grep() and
and arrays is definitely the 'B' answer. (Using a regex match to
emulate the string equality operator drops it to a 'C').
| for my $word (@words_left) {
| unless (exists($translations{$word})) {
| $translations{$word} = [ @words_right ];
| } else {
| for my $nw (@words_right) {
| unless (grep /^$nw$/, @{$translations{$word}}) {
| push(@{$translations{$word}}, $nw);
| }
| }
| }
| }

This is unduely complex, use a HoH.

| for my $word (@words_left) {
| @{$translations{$word}}{@right_words} = ();
| }

Or if you are feeling really terse:

| @{$translations{$_}}{@right_words} = () for @words_left;

Finally once %translations is fully populated as a HoH you can simply
convert to a HoA if you really want one.
| $_ = [ keys %$_ ] for values %translations;
 
N

nobull

mike said:
Thanks, nobull. Good advice all around. I was feeling guilty about
that grep... You're right, poor style, and I deserve to get
called on it.

I think you miss my point. If you are going to use an array then
there's nothing particularly wrong with grep. Sure it will loop all the
way to the end even if you find a hit early on but when hits are rare
anyhow this is no issue.
I agree that a HoH would be better here than a HoA, but didn't want to
do too much violence to the OP's original implementation.

OK, fair enough.
Without changing the HoA implementation, I came up with this, which I
think is better than what I had before:

| for my $word (@words_left) {
| if (!exists($trans{$word})) {

As I said before, I think it would be more ideomatic to leave out the
exists() as $trans{$word} will never exist but be false.
| $trans{$word} = [ @words_right ];

There is no need to copy the array - you could just say \@words_right.
| } else {
| for my $nw (@words_right) {
| my $found = 0;
| for (my $i=0; $i<@{$trans{$word}} && !$found; $i++) {
| $found++ if $trans{$word}->[$i] eq $nw;
| }

There is no need for $i in there

| my $found; # undef is a perfectly good false
| for (@{$trans{$word}}) {
| $found++,last if $_ eq $nw;
| }

Note: $found++ is actually slower than $found=1 but I find it more
ideomatic.

Personally I'd still use grep in this case as the effort of breaking
out of the loop is hardly warranted by the few duplicates we are
expecting.

| push @{$trans{$word}}, $nw unless grep { $_ eq $nw }
@{$trans{$word}};
 
A

Anno Siegel

mike wrote:
| my $found; # undef is a perfectly good false
| for (@{$trans{$word}}) {
| $found++,last if $_ eq $nw;
| }

Note: $found++ is actually slower than $found=1 but I find it more
ideomatic.

Personally I'd still use grep in this case as the effort of breaking
out of the loop is hardly warranted by the few duplicates we are
expecting.

Then there's List::Util::first, for cases where it matters.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top