Very simple hash/regex question

Tuxedo · Aug 23, 2012

What is a simple way to copy a hash into for example %hash_copy and change
all characters in the keys of the copied hash to lowercases and all
whitespaces to underscores?

my %hash = ('My first subject key' => 'my first value',
'my second Subject key' => 'my second value');

my %hash_copy = %hash;

...?

Many thanks for any suggestions.

Tuxedo

Klaus · Aug 23, 2012

What is a simple way to copy a hash into for example %hash_copy and change
all characters in the keys of the copied hash to lowercases and all
whitespaces to underscores?

use 5.014;
my %hash = ('My first subject key' => 'my first value',
'my second Subject key' => 'my second value');

my %hash_copy = map { lc($_ =~ s/ /_/gr) => $hash{$_} } %hash;

Peter J. Holzer · Aug 23, 2012

use 5.014;
my %hash = ('My first subject key' => 'my first value',
'my second Subject key' => 'my second value');

my %hash_copy = map { lc($_ =~ s/ /_/gr) => $hash{$_} } %hash;

use Data:

umper;
say Dumper \%hash_copy;

$VAR1 = {
'my_first_subject_key' => 'my first value',
'my_second_value' => undef,
'my_first_value' => undef,
'my_second_subject_key' => 'my second value'
};

not quite what Tuxedo wanted, I think.

hp

Tim McDaniel · Aug 23, 2012

use 5.014;
my %hash = ('My first subject key' => 'my first value',
'my second Subject key' => 'my second value');

my %hash_copy = map { lc($_ =~ s/ /_/gr) => $hash{$_} } %hash;

Fundamental problem 0: You didn't test the proposal.

1: %hash returns both keys and values, so hash_copy would get two
hashes for each one in the original table, one of them being
"transformed value => undef". You want "map{...}keys %hash".

2: I don't believe Perl defines an order of operations in this case,
where one part of the expression modifies $_ and another part uses
it. If it evaluates left to right, then $hash{$_} will try to use the
transformed $_, so it won't find the value. (This bit me on my first
attempt too.)

3: The spec is "all whitespaces". I think that means \s, not ' '.

4: "$_=~..." is the default operand.

My correction to that:

use 5.014;
my %hash_copy = map { my $key = $_; lc(s/\s/_/gr) => $hash{$key} } keys %hash;

But since you need a temp anyway (or so I think), there's no need for
s///r, so no need to require 5.014. So this also works without 5.014:

my %hash_copy = map { my $orig_ = $_; s/\s/_/g; lc($_) => $hash{$orig_} } keys %hash;

On the whole, I think this looks cleaner with just a plain loop:

my %hash_copy = ();
while (my ($key, $value) = each %hash) {
$key =~ s/\s/_/g;
$hash_copy{lc($key)} = $value;
}

Tuxedo · Aug 23, 2012

Peter said:
use Data:umper;
say Dumper \%hash_copy;

$VAR1 = {
'my_first_subject_key' => 'my first value',
'my_second_value' => undef,
'my_first_value' => undef,
'my_second_subject_key' => 'my second value'
};

not quite what Tuxedo wanted, I think.

hp

I'm not quite sure either to be honest....

What I have so far is a hash, like:

my %hash = ('My first subject key' => 'my first value',
'my second Subject key' => 'my second value');

First I thought I should duplicate the hash into a copy named for example
%hash_copy. Then modify the keys, not the values.

I would then run the script via cgi parameters, using the keys as they
appear in the lowercase and with underscores, e.g. 'my_second_subject_key'.

use CGI qw(param);
my $subject = param('subject');

The original %hash keys can contain spaces and capitals.

I would then like to access and print the key string in its original
whitespace and partly uppercase format, e.g. 'my second Subject key' when
accessing the script by for example.pl?subject=my_second_subject_key

Maybe it will be necessary to know the position of the given key in the
%hash_copy in order to access the key string in the same numerical position
as in the original %hash?

The idea is simply to access and print an original key string, based on the
modified one in the query string, without unecessarily resorting to the
idea of maintaining near duplicate hashes manually....

As mentioned, I'm not quite sure which is the best way to go about this.

Many thanks for any ideas.

Tuxedo

Tuxedo · Aug 23, 2012

Tim said:
Fundamental problem 0: You didn't test the proposal.

1: %hash returns both keys and values, so hash_copy would get two
hashes for each one in the original table, one of them being
"transformed value => undef". You want "map{...}keys %hash".

2: I don't believe Perl defines an order of operations in this case,
where one part of the expression modifies $_ and another part uses
it. If it evaluates left to right, then $hash{$_} will try to use the
transformed $_, so it won't find the value. (This bit me on my first
attempt too.)

3: The spec is "all whitespaces". I think that means \s, not ' '.

4: "$_=~..." is the default operand.

My correction to that:

use 5.014;
my %hash_copy = map { my $key = $_; lc(s/\s/_/gr) => $hash{$key} } keys
%hash;

But since you need a temp anyway (or so I think), there's no need for
s///r, so no need to require 5.014. So this also works without 5.014:

my %hash_copy = map { my $orig_ = $_; s/\s/_/g; lc($_) => $hash{$orig_} }
keys %hash;

On the whole, I think this looks cleaner with just a plain loop:

my %hash_copy = ();
while (my ($key, $value) = each %hash) {
$key =~ s/\s/_/g;
$hash_copy{lc($key)} = $value;
}

Thanks for posting these examples. I will test.

Tuxedo

Klaus · Aug 23, 2012

Fundamental problem 0: You didn't test the proposal.

doh and double-doh, I typed perl code on the fly and I messed up !

You (and Peter J. Holzer) are of course right. I didn't test my code
and I apologise.

Tim McDaniel · Aug 23, 2012

I'm not quite sure either to be honest....

Well, it's hard to get to a desired destination when you don't know
where you're going ...

What I have so far is a hash, like:

my %hash = ('My first subject key' => 'my first value',
'my second Subject key' => 'my second value');

First I thought I should duplicate the hash into a copy named for example
%hash_copy. Then modify the keys, not the values.

Well, to be precise, you can't per se modify the key of a hash
element. You can get the effect of that by creating a new hash
member, assigning an older value to be its value, then deleting the
older key and its value.

I would then run the script via cgi parameters, using the keys as they
appear in the lowercase and with underscores, e.g. 'my_second_subject_key'.

use CGI qw(param);
my $subject = param('subject');

The original %hash keys can contain spaces and capitals.
I would then like to access and print the key string in its original
whitespace and partly uppercase format, e.g. 'my second Subject key'
when accessing the script by for
example.pl?subject=my_second_subject_key

Had it been a case of needing to go from 'my second Subject key' to
'my_second_subject_key', you could use a hash or a sub. However, the
other direction is one to many -- given 'my_second_subject_key', you
can't tell what the original was. So you'd have to use a hash. In
any event, you have to determine whether it's possible to have a
collision like "my second Subject key" and "mY SeCond\tSUBJECT\rKeY",
and if so, what you plan to do about it.

Maybe it will be necessary to know the position of the given key in
the %hash_copy in order to access the key string in the same
numerical position as in the original %hash?

You cannot access a hash by a numerical position. You can only go
directly to an element via its key.

The idea is simply to access and print an original key string, based
on the modified one in the query string, without unecessarily
resorting to the idea of maintaining near duplicate hashes
manually....

Well, sorry, but that's what you're going to have to do.

If you are planning to do changes and references in lots of places,
then you might encapsulate the tracking in a module, or even a class,
with map_original_to_normalized(), map_normalized_to_original(), add,
delete, and such.

Tuxedo · Aug 24, 2012

Tim McDaniel wrote:

[...]

collision like "my second Subject key" and "mY SeCond\tSUBJECT\rKeY",
and if so, what you plan to do about it.

Thanks for mentioning this, it could indeed happen.

[...]

You cannot access a hash by a numerical position. You can only go
directly to an element via its key.

I suspected as much, so I'm not sure how it can be done.

Well, sorry, but that's what you're going to have to do.

All I had planned was to keep normal key values as they would be written in
a natural language, then change them to a format which contains no spaces
or capitals, then access both the original (normalised) key strings and
values as well as the modified ones with underscores, while only knowing
the modified key string at the time the script runs on a CGI request.
Instead, to maintain two separate hashes which can be accessed by the same
key-string in the query string can of course be done. It just means
dublicating some information manually, which is no big deal, although
suspect it can be done better. Anyway, then there would be the main hash:

my %hash = ('my_first_subject_key' => 'my first value',
'my_second_subject key' => 'my second value');

And an additional hash, providing the normalised word strings as values:

my %hash_normalize = ('my_first_subject_key' => 'My first subject key',
'my_second_subject_key' => 'my second Subject key');

I can now access both normal and modified versions using one parameter as a
key to both hashes.

If you are planning to do changes and references in lots of places,
then you might encapsulate the tracking in a module, or even a class,
with map_original_to_normalized(), map_normalized_to_original(), add,
delete, and such.

I'm not sure what kind of tracking module you refer to? Also, I don't fully
understand the map_original_to_normalized() and
map_normalized_to_original() class ideas.

Or maybe some other data structure could be better suited for my purpose.

Many thanks,
Tuxedo

Tuxedo · Aug 24, 2012

Ben Morrow wrote:

[...]

I would structure this like this:

my %hash = (
my_first_subject_key => {
key => "My first subject key",
value => "my first value",
},
...
);

See perllol, perldsc and perlreftut. Of course, you might want to give
the subhashes more meaningful keys than 'key' and 'value'.

Ben

Thanks for the example. I will delve into those manuals.

Tuxedo

Tim McDaniel · Aug 24, 2012

Quoth (e-mail address removed):

I believe the order of operations is always well-defined in Perl: that
is, I don't know of any cases where it's been changed, nor any cases
where changing the order wouldn't be considered a bug.

TL;DR: show me where Perl systematically talks about order of
evaluation, except implicitly in some places, or talks about anything
like "sequence points".

I don't know of any place that Perl explicitly defines the order of
evaluation and refers to anything like C's "sequence points", except
where implied by things like "If the argument before the ? is true,
the argument before the : is returned, otherwise the argument after
the : is returned.". For example, for ++ and --, man perlop has

Note that just as in C, Perl doesn't define when the variable is
incremented or decremented. You just know it will be done sometime
before or after the value is returned. This also means that
modifying a variable twice in the same statement will lead to
undefined behavior. Avoid statements like:

$i = $i ++;
print ++ $i + $i ++;

Perl will not guarantee what the result of the above statements
is.

But C actually *does* define when the increment or decrement happens:
some time after the previous sequence point and before the next one.
C would not have sequence points in the problematic areas above, mind
you, so that wouldn't matter in these two lines. But C does define
that in
... (i++, i) ...
the increment happens no later than the comma operator, so the value
of "i" alone is the incremented version. (If I'm reading a draft
standard right, if it matches the current version, and if my old
neurons are firing right.)

In this case, '=>' is just sugar for ',', so the order would be
well-defined even in C.

No it would not. In the map above, the tokens "=>" and "," are not
comma operators, which in C would cause a sequence point. The only
place I know of in C where "," represents a list of values is in
initializations of arrays or structs or the like, and the draft I saw
(can't find the real standard) has "The evaluations of the
initialization list expressions are indeterminately sequenced with
respect to one another and thus the order in which any side effects
occur is unspecified."

Peter J. Holzer · Aug 25, 2012

Fundamental problem 0: You didn't test the proposal.

1: %hash returns both keys and values, so hash_copy would get two
hashes for each one in the original table, one of them being
"transformed value => undef". You want "map{...}keys %hash".
Yup.

2: I don't believe Perl defines an order of operations in this case,
where one part of the expression modifies $_ and another part uses
it. If it evaluates left to right, then $hash{$_} will try to use the
transformed $_, so it won't find the value. (This bit me on my first
attempt too.)

$_ isn't transformed because of the /r modifier. So the order doesn't
matter (although I agree with Ben that it's well-defined in this case).

My correction to that:

use 5.014;
my %hash_copy = map { my $key = $_; lc(s/\s/_/gr) => $hash{$key} } keys %hash;

But since you need a temp anyway (or so I think), there's no need for
s///r, so no need to require 5.014.

The /r avoids the need for the temporary variable. You either need a
temporary variable (then it works with any version of perl) or /r (then
you need 5.14), but not both.

hp

Tim McDaniel · Aug 25, 2012

$_ isn't transformed because of the /r modifier. So the order doesn't
matter (although I agree with Ben that it's well-defined in this case).

I am not familiar with s///r, as $ORKPLACE doesn't have the current
Perl. Thank you for the correction.

Were it to depend on the order of effects (if there were not explicit
definition as, for example, && and || provide), I would intensely
dislike it, even if experimentally it were to work.

Rainer Weikusat · Aug 25, 2012

[...]

my $i = 2;
my $j;
$j = \++$i, $i = 10, say $$j;

will print '10' despite the assignment to $j happening before the
assignment to $i.

Eh ... considering that the value of $j is a reference to $i, what
else should $$j print except the current value of $i?

John W. Krahn · Aug 27, 2012

Tuxedo said:
What is a simple way to copy a hash into for example %hash_copy and change
all characters in the keys of the copied hash to lowercases and all
whitespaces to underscores?

my %hash = ('My first subject key' => 'my first value',
'my second Subject key' => 'my second value');

my %hash_copy = %hash;

..?

$ perl -e'
use Data:

umper;
my %hash = (
q/My first subject key/ => q/my first value/,
q/my second Subject key/ => q/my second value/,
);
my %hash_copy = %hash;
print Dumper \%hash_copy;
for my $key ( keys %hash_copy ) {
( my $new_key = lc $key ) =~ s/\s/_/g;
$hash_copy{ $new_key } = delete $hash_copy{ $key };
}
print Dumper \%hash_copy;
'
$VAR1 = {
'My first subject key' => 'my first value',
'my second Subject key' => 'my second value'
};
$VAR1 = {
'my_first_subject_key' => 'my first value',
'my_second_subject_key' => 'my second value'
};

John

Push regex search result into hash with multiple values	14	May 19, 2014
hash of arrays	1	Sep 13, 2012
Help with a hash	4	Apr 11, 2012
dynamically creating a hash from an array	16	Mar 21, 2014
FAQ 4.60 How do I sort a hash (optionally by value instead of key)?	0	Mar 14, 2011
Hash key types and equality of hash keys	2	Mar 1, 2012
Database Manager: A C++ Console Application	14	May 12, 2025
FAQ 4.55 How do I process an entire hash?	0	Apr 7, 2011

Very simple hash/regex question

Tuxedo

Klaus

Peter J. Holzer

Tim McDaniel

Tuxedo

Tuxedo

Klaus

Tim McDaniel

Tuxedo

Tuxedo

Tim McDaniel

Peter J. Holzer

Tim McDaniel

Rainer Weikusat

John W. Krahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads