differences between hashes and arrays ?

Discussion in 'Perl Misc' started by Jack, Sep 22, 2006.

  1. Jack

    Jack Guest

    Hi folks I am not an expert in perl but correct me if I am wrong -

    is it true you can use a hash and work with it just as you would an
    array - what are the differences between them (besides in an array you
    can have a multidimensional array) ?

    why arent folks using hashes instead of arrays since (I believe) they
    are faster to access and take up the same or less memory than arrays..

    Thanks,

    Jack
     
    Jack, Sep 22, 2006
    #1
    1. Advertisements

  2. Jack

    Mirco Wahab Guest

    Thus spoke Jack (on 2006-09-22 16:55):
    There is a similar interface, but thats almost all.

    ==> http://en.wikipedia.org/wiki/Hash_table

    In short, a 'hash table' is a complicated organism
    which allows you to find an item (say: key, stored
    in there) in very very short time, which is the point.

    An "array" means otherwise "access by an index number",
    so array elements must be searched in order to
    find a particular item.

    compare:

    my $item = $hash{ 'ITEM_OF_INTEREST' }; # gets the item

    vs.

    SEARCH: foreach $element (@array) {
    if($element eq 'ITEM_OF_INTEREST') {
    $item = $element; # gets the item
    last SEARCH;
    }
    }
    No, folks using hashes if hash tables are appropriate
    and arrays if arrays are ...

    Say, you have ten million 3D coordinate points
    which are in some order already (an order that
    means something to you). Why would you bother to
    put them in a hash table, if all you need is sub-
    sequent access to the 13244'st or 10'th to 1923331'th?

    Regards

    Mirco
     
    Mirco Wahab, Sep 22, 2006
    #2
    1. Advertisements


  3. Arrays are ordered and indexed with numbers.

    Hashes are UNordered and indexed with strings.


    You can also have a multidimensional hash, so that is not
    one of the differences.
     
    Tad McClellan, Sep 22, 2006
    #3
  4. Jack

    Paul Lalli Guest

    You're wrong. :)
    A hash is by definition unordered. There is no ordering of any kind to
    the key/value pairs in a hash. An array is ordered - there is a very
    well defined sequence of which item follows which item. Therefore, you
    can treat an array like a stack or a queue (using the push/pop
    shift/unshift set of functions), while this is not possible with a
    hash. With a hash you can associate to distinct pieces of data in one
    structure, and can retrieve a specific value in very efficient time.

    An array is indexed by positive integers. A hash is keyed by any
    string, whether its contents are numeric or not.
    Ironically, that is *not* one of the differences. You can have
    multidimensional hashes as well. In fact, you can have hashes of
    arrays, arrays of hashes, arrays of hashes of hashes, hashes of arrays
    of hashes, or any other combination you'd like.
    Faster to access, yes. Same or less memory, no.

    These are not apples to apples. You use a hash and an array for
    completely different purposes.

    If you have an example of a program or algorithm you're working on, we
    could help you get rid of some of your misperceptions. . . .

    Paul Lalli
     
    Paul Lalli, Sep 22, 2006
    #4
  5. Jack

    xhoster Guest

    That would depend on what you are doing. For some things, you could easily
    use them interchangably. For most things, you can't. If youre indices are
    not integers, you can't use arrays. When your indices are integers but are
    not densely packed but need to accessed as though they were, you cannot
    readily use hashes.

    That is not a difference between them (in Perl).
    Wrong on both counts. When arrays and hashes are interchangable, hashes
    are much slower and take substantially more memory.


    Xho
     
    xhoster, Sep 22, 2006
    #5
  6. Jack

    Jack Guest

    Thanks for the comments all. For these applications, which is better
    suited, array or hash, and assume a very large volume of data and
    consider memory :
    1- iterate through a huge list (hash or array?) of values and compare
    each value in the list, 1 by 1, to a match condition test like
    m/^\d.*?\d$/ (all digits), and then keep the count of the values that
    pass the match test
    2- compare all values in column 1, file 1 (one value at a time ; assume
    10M records ) to determine if the value exists in the target list (hash
    or array?) from column 2 file 2 (assume column 2 is 20M records)

    Thank you,
    Jack
     
    Jack, Sep 22, 2006
    #6
  7. Jack

    Paul Lalli Guest

    array. You just want to iterate over all of them, in sequence. No
    reason for the overhead of a hash.

    my $count = grep /^\d.*?\d$/ @great_big_array;

    (by the way, you do know that your pattern does not match "all digits",
    right? It matches anything that starts with and ends with a digit, of
    at least length two. The "inside" of the value can be literally
    anything.)
    column1, file1 - array. You want to iterate over all of them, one at a
    time, in sequence.

    column2, file2 - hash. You want to access a specific element of your
    set, without knowing or caring where in the set that element falls.

    for my $value (@column1){
    if (exists $column2{$value}) {
    print "$value exists in column2 file2\n";
    } else {
    print "$vlaue does not exist in column2 file2\n";
    }
    }

    Paul Lalli
     
    Paul Lalli, Sep 22, 2006
    #7
  8. Except in the rare and special case of a very sparsely populated array.
     
    Brian McCauley, Sep 22, 2006
    #8
  9. Jack

    Henry Law Guest

    Of course all the material about the difference in the ways that hashes
    and arrays is perfectly correct, but there is one sense in which the OP
    is correct - or at least has some method in his madness. Look:

    # -------------------------------------------
    #! /usr/bin/perl

    use strict; use warnings;

    use Data::Dumper;

    my @array = ("one",111,"two",22);
    # Or we could even have coded my %hash = ("one",111,"two",22);
    my %hash = @array;

    print Dumper(\%hash);
    # --------------------------------------------
    F:\>tryit.pl
    $VAR1 = {
    'one' => 111,
    'two' => 22
    };
     
    Henry Law, Sep 22, 2006
    #9
  10. Jack

    Paul Lalli Guest

    None of this shows "using a hash and array in the same way". This is
    populating an array with a list of values, and then populating a hash
    with the list of values that are currently contained in an array.
    %hash and @array have nothing to do with each other before or after
    this statement.

    Make sure you read and understand
    perldoc -q difference
    "What is the difference between a list and an array"

    Paul Lalli
     
    Paul Lalli, Sep 22, 2006
    #10

  11. Change that line to:

    my @array = ("two",22,"one",111);

    and run the program again.


    You get the _same_ output!
     
    Tad McClellan, Sep 22, 2006
    #11
  12. Jack

    Uri Guttman Guest

    PL> Faster to access, yes. Same or less memory, no.

    hmm, in general arrays have faster lookups than hashes. me thinks you
    confused the OP's statement. and arrays definitely use less storage
    (even excluding the key itself) for each element.

    PL> These are not apples to apples. You use a hash and an array for
    PL> completely different purposes.

    they are apples and bananas. :)

    PL> If you have an example of a program or algorithm you're working on, we
    PL> could help you get rid of some of your misperceptions. . . .

    or he should learn more perl before asking such deep questions. :)

    uri
     
    Uri Guttman, Sep 22, 2006
    #12
  13. Jack

    Paul Lalli Guest

    Or I confused my answer. :) What I meant was that:

    my $there = $hash{$key} ? 1 : 0;
    is faster than:
    my $there = 0;
    for my $elem (@array) {
    if ($elem eq $value) {
    $there = 1;
    next;
    }
    }

    Paul Lalli
     
    Paul Lalli, Sep 22, 2006
    #13
  14. Both are mappings. Arrays map from a contiguous list of natural numbers,
    starting with 0 (assuming you didn't fool around with $[) into scalars.
    Hashes are a generalization and map arbitrary strings to scalars.
    Very often you need organize your data in a sequence. This sequencing is
    naturally achieved by the natural order of number when they are used as
    domain of your mapping. For hashes it is possible to sort the keys of a
    hash and then access the values in sequence, but this operation is much more
    expensive.

    jue
     
    Jürgen Exner, Sep 23, 2006
    #14
  15. Not really. To access the 23rd element of an array in a typical
    implementation you would simply access
    StartOfArray + 23 * SizeOfArrayElement
    which is just adding a constant that is independant of the size of the
    array. No need to search (which would be O(n log n)).

    jue
     
    Jürgen Exner, Sep 23, 2006
    #15
  16. Jack

    Uri Guttman Guest

    PL> Or I confused my answer. :) What I meant was that:

    PL> my $there = $hash{$key} ? 1 : 0;
    PL> is faster than:
    PL> my $there = 0;
    PL> for my $elem (@array) {
    PL> if ($elem eq $value) {
    PL> $there = 1;
    PL> next;
    PL> }
    PL> }

    well then, why didn't you say that!! string key lookup is a very diff
    beast than element lookup. :)

    uri
     
    Uri Guttman, Sep 23, 2006
    #16
  17. Jack

    Uri Guttman Guest

    CM> That's true in most languages because in most languages arrays
    CM> are required to have all elements be the same type, making
    CM> looking up an element a simple matter of pointer arithmetic
    CM> (you see this in its most naked form in a higher-level language
    CM> in C, where arrays and pointers are the same thing, and array
    CM> lookups are defined in the language as pointer arithmetic.
    CM> This results in a lot of C's less cuddly quirks, but that's
    CM> another subject...). But in Perl, each element can be
    CM> entirely different and each element has its own size. There
    CM> are several different ways Perl can handle this problem; I
    CM> don't know Perl internals so I don't know how it does it.
    CM> But however it's done, it's going to take more cycles than
    CM> a simple homogenous array (and probably more space, too).

    perl's arrays ARE homogeneous but at a different level than scalar
    values. the SV's of an array are in a real c array and are indexed just
    like you say. otherwise how would perl lookup an array element? do you
    think it scans the array in some linked list format for the nth element?

    but my quoted statement is still true - perl arrays are faster and use
    less ram than hashes. this is easily tested with benchmark.pm and
    the devel::size (iirc the name correctly) module that shows ram
    usage. so you don't need to know perl internals to learn about array vs
    hash resource usage.

    uri
     
    Uri Guttman, Sep 23, 2006
    #17
  18. It seems people are just as confused about the difference between arrays
    and pointers in C as they are about the difference between arrays and
    lists in Perl :)

    Arrays and pointers in C are definitely not the same thing, An array is
    a group of elements of the same type, while a pointer, well, points to
    an element of a certain type. You notice the difference when you try to
    assign to an array (doesn't work) or when you use sizeof on a pointer
    (probably not what you wanted), and very spectacularly if you define a
    variable as "int a[10]" in one source file and declare it as "extern int
    *a" in another. However:
    This is true. (Actually it's the other way round: Pointer arithmetic is
    defined in terms as accesses to array elements)

    When you use an array as an rvalue in an expression, it is converted to
    a pointer to its first element. Thus you can use an array as an rvalue
    whereever you can use a pointer.

    (There is also the syntactic quirk that you cannot use arrays as
    parameters, but it's not an error to try - the compiler will just
    silently convert the array declaration to a pointer declaration)

    hp
     
    Peter J. Holzer, Sep 23, 2006
    #18
  19. Jack

    Dr.Ruud Guest

    Chris Mattern schreef:
    No, they are and they aren't. It all depends on what you use them for.

    <example type="bad">

    my %hary = (
    0 => q{foo},
    1 => q{bar},
    ) ;

    my @ash = (
    [q{dgd vggw ggg tgregdtg trgregfdsggt gtdeg trg}, q{foo}] ,
    [q{wgdeg ergrewg rewgrew grewg ewg ewgew gewrgewg}, q{bar}] ,
    ) ;

    </example>
     
    Dr.Ruud, Sep 23, 2006
    #19
  20. How then do you know it's item #23?
     
    Josef Moellers, Sep 25, 2006
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.