differences between hashes and arrays ?

J

Jack

Hi folks I am not an expert in perl but correct me if I am wrong -

is it true you can use a hash and work with it just as you would an
array - what are the differences between them (besides in an array you
can have a multidimensional array) ?

why arent folks using hashes instead of arrays since (I believe) they
are faster to access and take up the same or less memory than arrays..

Thanks,

Jack
 
M

Mirco Wahab

Thus spoke Jack (on 2006-09-22 16:55):
is it true you can use a hash and work with it just as you would an
array - what are the differences between them (besides in an array you
can have a multidimensional array) ?

There is a similar interface, but thats almost all.

==> http://en.wikipedia.org/wiki/Hash_table

In short, a 'hash table' is a complicated organism
which allows you to find an item (say: key, stored
in there) in very very short time, which is the point.

An "array" means otherwise "access by an index number",
so array elements must be searched in order to
find a particular item.

compare:

my $item = $hash{ 'ITEM_OF_INTEREST' }; # gets the item

vs.

SEARCH: foreach $element (@array) {
if($element eq 'ITEM_OF_INTEREST') {
$item = $element; # gets the item
last SEARCH;
}
}
why arent folks using hashes instead of arrays since (I believe) they
are faster to access and take up the same or less memory than arrays..

No, folks using hashes if hash tables are appropriate
and arrays if arrays are ...

Say, you have ten million 3D coordinate points
which are in some order already (an order that
means something to you). Why would you bother to
put them in a hash table, if all you need is sub-
sequent access to the 13244'st or 10'th to 1923331'th?

Regards

Mirco
 
T

Tad McClellan

Jack said:
is it true you can use a hash and work with it just as you would an
array -

No.


what are the differences between them


Arrays are ordered and indexed with numbers.

Hashes are UNordered and indexed with strings.

(besides in an array you
can have a multidimensional array) ?


You can also have a multidimensional hash, so that is not
one of the differences.
 
P

Paul Lalli

Jack said:
Hi folks I am not an expert in perl but correct me if I am wrong -

You're wrong. :)
is it true you can use a hash and work with it just as you would an
array
No.

- what are the differences between them

A hash is by definition unordered. There is no ordering of any kind to
the key/value pairs in a hash. An array is ordered - there is a very
well defined sequence of which item follows which item. Therefore, you
can treat an array like a stack or a queue (using the push/pop
shift/unshift set of functions), while this is not possible with a
hash. With a hash you can associate to distinct pieces of data in one
structure, and can retrieve a specific value in very efficient time.

An array is indexed by positive integers. A hash is keyed by any
string, whether its contents are numeric or not.
(besides in an array you
can have a multidimensional array) ?

Ironically, that is *not* one of the differences. You can have
multidimensional hashes as well. In fact, you can have hashes of
arrays, arrays of hashes, arrays of hashes of hashes, hashes of arrays
of hashes, or any other combination you'd like.
why arent folks using hashes instead of arrays since (I believe) they
are faster to access and take up the same or less memory than arrays..

Faster to access, yes. Same or less memory, no.

These are not apples to apples. You use a hash and an array for
completely different purposes.

If you have an example of a program or algorithm you're working on, we
could help you get rid of some of your misperceptions. . . .

Paul Lalli
 
X

xhoster

Jack said:
Hi folks I am not an expert in perl but correct me if I am wrong -

is it true you can use a hash and work with it just as you would an
array

That would depend on what you are doing. For some things, you could easily
use them interchangably. For most things, you can't. If youre indices are
not integers, you can't use arrays. When your indices are integers but are
not densely packed but need to accessed as though they were, you cannot
readily use hashes.

- what are the differences between them (besides in an array you
can have a multidimensional array) ?

That is not a difference between them (in Perl).
why arent folks using hashes instead of arrays since (I believe) they
are faster to access and take up the same or less memory than arrays..

Wrong on both counts. When arrays and hashes are interchangable, hashes
are much slower and take substantially more memory.


Xho
 
J

Jack

Michele said:
Hi folks I am not an expert in perl but correct me if I am wrong -

Seeing your question and your considerations below, indeed you seem
not to be an expert in perl. So I guess you're not wrong.
is it true you can use a hash and work with it just as you would an
array - what are the differences between them (besides in an array you
can have a multidimensional array) ?

You can also have multidimensional hashes. But both are not "real"
multidimentional beasts. However for the moment don't bother...

Really hashes are associative arrays, i.e. they are a finite mapping
between a set of _keys_ and a set _values_. The keys are generic
strings. An array is a mapping between a finite set of natural numbers
of the form {0,... , n} (if you don't set $[ to something other than
0, but again, don't bother!) and a set of values.
why arent folks using hashes instead of arrays since (I believe) they
are faster to access and take up the same or less memory than arrays..

You will certainly know that in perl conversions between numbers and
strings happen automagically all the time. So indeed you may use a
hash to implement an array. However hashes are called like that
because their implementation is based on hash tables that provide
means to quickly search through the keys. There's no particular
general advantage when the keys are simple numbers instead.

So use an array when you need a mapping from a finite set of numbers
to some values, except possibly when the mapping is very sparse. How
much is "very", may depend on many factors, and I don't know the lower
limit exactly. Surely I would use a hash if I need to *explicitly* set
the mapping amongst say 100 numbers uniformely distribuited in
0..100_000_000 and some values. I don't have the vaguest idea how well
the hashing algorithm would hash such keys, but the thing can be
investigated...


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
.'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,

Thanks for the comments all. For these applications, which is better
suited, array or hash, and assume a very large volume of data and
consider memory :
1- iterate through a huge list (hash or array?) of values and compare
each value in the list, 1 by 1, to a match condition test like
m/^\d.*?\d$/ (all digits), and then keep the count of the values that
pass the match test
2- compare all values in column 1, file 1 (one value at a time ; assume
10M records ) to determine if the value exists in the target list (hash
or array?) from column 2 file 2 (assume column 2 is 20M records)

Thank you,
Jack
 
P

Paul Lalli

Jack said:
Thanks for the comments all. For these applications, which is better
suited, array or hash, and assume a very large volume of data and
consider memory :
1- iterate through a huge list (hash or array?) of values and compare
each value in the list, 1 by 1, to a match condition test like
m/^\d.*?\d$/ (all digits), and then keep the count of the values that
pass the match test

array. You just want to iterate over all of them, in sequence. No
reason for the overhead of a hash.

my $count = grep /^\d.*?\d$/ @great_big_array;

(by the way, you do know that your pattern does not match "all digits",
right? It matches anything that starts with and ends with a digit, of
at least length two. The "inside" of the value can be literally
anything.)
2- compare all values in column 1, file 1 (one value at a time ; assume
10M records ) to determine if the value exists in the target list (hash
or array?) from column 2 file 2 (assume column 2 is 20M records)

column1, file1 - array. You want to iterate over all of them, one at a
time, in sequence.

column2, file2 - hash. You want to access a specific element of your
set, without knowing or caring where in the set that element falls.

for my $value (@column1){
if (exists $column2{$value}) {
print "$value exists in column2 file2\n";
} else {
print "$vlaue does not exist in column2 file2\n";
}
}

Paul Lalli
 
B

Brian McCauley

Wrong on both counts. When arrays and hashes are interchangable, hashes
are much slower and take substantially more memory.

Except in the rare and special case of a very sparsely populated array.
 
H

Henry Law

Paul said:
You're wrong. :)


No.

Of course all the material about the difference in the ways that hashes
and arrays is perfectly correct, but there is one sense in which the OP
is correct - or at least has some method in his madness. Look:

# -------------------------------------------
#! /usr/bin/perl

use strict; use warnings;

use Data::Dumper;

my @array = ("one",111,"two",22);
# Or we could even have coded my %hash = ("one",111,"two",22);
my %hash = @array;

print Dumper(\%hash);
# --------------------------------------------
F:\>tryit.pl
$VAR1 = {
'one' => 111,
'two' => 22
};
 
P

Paul Lalli

Henry said:
Of course all the material about the difference in the ways that hashes
and arrays is perfectly correct, but there is one sense in which the OP
is correct - or at least has some method in his madness. Look:

# -------------------------------------------
#! /usr/bin/perl

use strict; use warnings;

use Data::Dumper;

my @array = ("one",111,"two",22);
# Or we could even have coded my %hash = ("one",111,"two",22);
my %hash = @array;

None of this shows "using a hash and array in the same way". This is
populating an array with a list of values, and then populating a hash
with the list of values that are currently contained in an array.
%hash and @array have nothing to do with each other before or after
this statement.

Make sure you read and understand
perldoc -q difference
"What is the difference between a list and an array"

Paul Lalli
 
T

Tad McClellan

Henry Law said:
Of course all the material about the difference in the ways that hashes
and arrays is perfectly correct, but there is one sense in which the OP
is correct - or at least has some method in his madness. Look:

# -------------------------------------------
#! /usr/bin/perl

use strict; use warnings;

use Data::Dumper;

my @array = ("one",111,"two",22);


Change that line to:

my @array = ("two",22,"one",111);

and run the program again.

# Or we could even have coded my %hash = ("one",111,"two",22);
my %hash = @array;

print Dumper(\%hash);
# --------------------------------------------
F:\>tryit.pl
$VAR1 = {
'one' => 111,
'two' => 22
};


You get the _same_ output!
 
U

Uri Guttman

PL> Faster to access, yes. Same or less memory, no.

hmm, in general arrays have faster lookups than hashes. me thinks you
confused the OP's statement. and arrays definitely use less storage
(even excluding the key itself) for each element.

PL> These are not apples to apples. You use a hash and an array for
PL> completely different purposes.

they are apples and bananas. :)

PL> If you have an example of a program or algorithm you're working on, we
PL> could help you get rid of some of your misperceptions. . . .

or he should learn more perl before asking such deep questions. :)

uri
 
P

Paul Lalli

Uri said:
PL> Faster to access, yes. Same or less memory, no.

hmm, in general arrays have faster lookups than hashes. me thinks you
confused the OP's statement.

Or I confused my answer. :) What I meant was that:

my $there = $hash{$key} ? 1 : 0;
is faster than:
my $there = 0;
for my $elem (@array) {
if ($elem eq $value) {
$there = 1;
next;
}
}

Paul Lalli
 
J

Jürgen Exner

Jack said:
is it true you can use a hash and work with it just as you would an
array - what are the differences between them (besides in an array you
can have a multidimensional array) ?

Both are mappings. Arrays map from a contiguous list of natural numbers,
starting with 0 (assuming you didn't fool around with $[) into scalars.
Hashes are a generalization and map arbitrary strings to scalars.
why arent folks using hashes instead of arrays since (I believe) they
are faster to access and take up the same or less memory than arrays..

Very often you need organize your data in a sequence. This sequencing is
naturally achieved by the natural order of number when they are used as
domain of your mapping. For hashes it is possible to sort the keys of a
hash and then access the values in sequence, but this operation is much more
expensive.

jue
 
J

Jürgen Exner

Mirco said:
An "array" means otherwise "access by an index number",
so array elements must be searched in order to
find a particular item.

Not really. To access the 23rd element of an array in a typical
implementation you would simply access
StartOfArray + 23 * SizeOfArrayElement
which is just adding a constant that is independant of the size of the
array. No need to search (which would be O(n log n)).

jue
 
U

Uri Guttman

PL> Or I confused my answer. :) What I meant was that:

PL> my $there = $hash{$key} ? 1 : 0;
PL> is faster than:
PL> my $there = 0;
PL> for my $elem (@array) {
PL> if ($elem eq $value) {
PL> $there = 1;
PL> next;
PL> }
PL> }

well then, why didn't you say that!! string key lookup is a very diff
beast than element lookup. :)

uri
 
U

Uri Guttman

CM> That's true in most languages because in most languages arrays
CM> are required to have all elements be the same type, making
CM> looking up an element a simple matter of pointer arithmetic
CM> (you see this in its most naked form in a higher-level language
CM> in C, where arrays and pointers are the same thing, and array
CM> lookups are defined in the language as pointer arithmetic.
CM> This results in a lot of C's less cuddly quirks, but that's
CM> another subject...). But in Perl, each element can be
CM> entirely different and each element has its own size. There
CM> are several different ways Perl can handle this problem; I
CM> don't know Perl internals so I don't know how it does it.
CM> But however it's done, it's going to take more cycles than
CM> a simple homogenous array (and probably more space, too).

perl's arrays ARE homogeneous but at a different level than scalar
values. the SV's of an array are in a real c array and are indexed just
like you say. otherwise how would perl lookup an array element? do you
think it scans the array in some linked list format for the nth element?

but my quoted statement is still true - perl arrays are faster and use
less ram than hashes. this is easily tested with benchmark.pm and
the devel::size (iirc the name correctly) module that shows ram
usage. so you don't need to know perl internals to learn about array vs
hash resource usage.

uri
 
P

Peter J. Holzer

That's true in most languages because in most languages arrays
are required to have all elements be the same type, making
looking up an element a simple matter of pointer arithmetic
(you see this in its most naked form in a higher-level language
in C, where arrays and pointers are the same thing,

It seems people are just as confused about the difference between arrays
and pointers in C as they are about the difference between arrays and
lists in Perl :)

Arrays and pointers in C are definitely not the same thing, An array is
a group of elements of the same type, while a pointer, well, points to
an element of a certain type. You notice the difference when you try to
assign to an array (doesn't work) or when you use sizeof on a pointer
(probably not what you wanted), and very spectacularly if you define a
variable as "int a[10]" in one source file and declare it as "extern int
*a" in another. However:
and array lookups are defined in the language as pointer arithmetic.

This is true. (Actually it's the other way round: Pointer arithmetic is
defined in terms as accesses to array elements)

When you use an array as an rvalue in an expression, it is converted to
a pointer to its first element. Thus you can use an array as an rvalue
whereever you can use a pointer.

(There is also the syntactic quirk that you cannot use arrays as
parameters, but it's not an error to try - the compiler will just
silently convert the array declaration to a pointer declaration)

hp
 
D

Dr.Ruud

Chris Mattern schreef:
Jack:

Um, no, arrays are much more efficient than hashes.

No, they are and they aren't. It all depends on what you use them for.

<example type="bad">

my %hary = (
0 => q{foo},
1 => q{bar},
) ;

my @ash = (
[q{dgd vggw ggg tgregdtg trgregfdsggt gtdeg trg}, q{foo}] ,
[q{wgdeg ergrewg rewgrew grewg ewg ewgew gewrgewg}, q{bar}] ,
) ;

</example>
 
J

Josef Moellers

Jürgen Exner said:
Not really. To access the 23rd element of an array in a typical
implementation you would simply access
StartOfArray + 23 * SizeOfArrayElement
which is just adding a constant that is independant of the size of the
array. No need to search (which would be O(n log n)).

How then do you know it's item #23?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top