suitable key for a hash

Discussion in 'Perl Misc' started by ccc31807, Oct 12, 2010.

  1. ccc31807

    ccc31807 Guest

    I have a data file to process that consists of about 25K rows and
    about 30 columns. This file contains no column with unique values,
    that is, every column contains duplicate values. I am placing the data
    in a hash to process it (so I can access the data values by name
    rather than position), and the only 'key' I can come up with is the $.
    variable for the input line numbers.

    Surely someone must have dealt with this problem before. Is there a
    better solution?

    The processing requires dumping the data into discrete categories,
    e.g., level, state, person's name, status, for the purpose of
    generating reports, e.g., by level, by state, by name, by status, and
    not having a unique key isn't an issue.

    CC.
    ccc31807, Oct 12, 2010
    #1
    1. Advertising

  2. On 12/10/2010 17:19, ccc31807 wrote:
    > I have a data file to process that consists of about 25K rows and
    > about 30 columns. This file contains no column with unique values,
    > that is, every column contains duplicate values. I am placing the data
    > in a hash to process it (so I can access the data values by name
    > rather than position), and the only 'key' I can come up with is the $.
    > variable for the input line numbers.
    >
    > Surely someone must have dealt with this problem before. Is there a
    > better solution?


    A better solution than
    ... $name{$index} ...
    must surely be
    ... $name[$index] ...

    I don't see any point using hashes if the key value is an integer in the
    range 1..25000 with no gaps.


    > The processing requires dumping the data into discrete categories,
    > e.g., level, state, person's name, status, for the purpose of
    > generating reports, e.g., by level, by state, by name, by status, and
    > not having a unique key isn't an issue.


    An SSCCE would help.

    --
    RGB
    RedGrittyBrick, Oct 12, 2010
    #2
    1. Advertising

  3. ccc31807

    Jim Gibson Guest

    In article
    <>,
    ccc31807 <> wrote:

    > I have a data file to process that consists of about 25K rows and
    > about 30 columns. This file contains no column with unique values,
    > that is, every column contains duplicate values. I am placing the data
    > in a hash to process it (so I can access the data values by name
    > rather than position), and the only 'key' I can come up with is the $.
    > variable for the input line numbers.
    >
    > Surely someone must have dealt with this problem before. Is there a
    > better solution?


    If you have records with duplicate keys and you want to store the data
    in a hash for rapid lookup, use array references as hash values
    (untested):

    while(<>) {
    my( $name, @rest ) = split;
    push( @{$data{$name}}, \@rest );
    }

    >
    > The processing requires dumping the data into discrete categories,
    > e.g., level, state, person's name, status, for the purpose of
    > generating reports, e.g., by level, by state, by name, by status, and
    > not having a unique key isn't an issue.


    Store the data in an array and create indices for key fields (untested);

    while(<>) {
    my @fields = split;
    push( @data, \@fields );
    push( @{$field1_index{$field[0]}}, $#data );
    push( @{$field2_index{$field[1]}}, $#data );
    ...
    }

    --
    Jim Gibson
    Jim Gibson, Oct 12, 2010
    #3
  4. ccc31807 wrote:
    > I have a data file to process that consists of about 25K rows and
    > about 30 columns. This file contains no column with unique values,
    > that is, every column contains duplicate values.



    Jointly, or just severly?


    > I am placing the data
    > in a hash to process it (so I can access the data values by name
    > rather than position),


    If you wish to access it by name, then you must know what the name is.

    > and the only 'key' I can come up with is the $.
    > variable for the input line numbers.


    Why not just an array, in that case?

    >
    > Surely someone must have dealt with this problem before. Is there a
    > better solution?
    >
    > The processing requires dumping the data into discrete categories,
    > e.g., level, state, person's name, status, for the purpose of
    > generating reports, e.g., by level, by state, by name, by status, and
    > not having a unique key isn't an issue.


    Ok, so just stick it directly into those structures.

    Xho
    Xho Jingleheimerschmidt, Oct 13, 2010
    #4
  5. ccc31807

    Justin C Guest

    On 2010-10-12, ccc31807 <> wrote:
    > I have a data file to process that consists of about 25K rows and
    > about 30 columns. This file contains no column with unique values,
    > that is, every column contains duplicate values. I am placing the data
    > in a hash to process it (so I can access the data values by name
    > rather than position), and the only 'key' I can come up with is the $.
    > variable for the input line numbers.
    >
    > Surely someone must have dealt with this problem before. Is there a
    > better solution?
    >
    > The processing requires dumping the data into discrete categories,
    > e.g., level, state, person's name, status, for the purpose of
    > generating reports, e.g., by level, by state, by name, by status, and
    > not having a unique key isn't an issue.


    Instead of sticking it into a hash so that you can go over all of it
    again, why not process (or part process) it into the relevant discrete
    categories as part of the import?

    Justin.
    --
    Justin C, by the sea.
    Justin C, Oct 13, 2010
    #5
  6. ccc31807

    ccc31807 Guest

    Thanks for your reply, and for all the others.

    I decided to continue to use $. as the hash key. As it turns out, the
    key isn't relevant to my application, as I'm not using the key to look
    up the hash values. I'm just iterating through the hash, collecting
    certain values, so the key is totally superfluous -- the only reason I
    need a key is because of the nature of the hash.

    I don't want to use an array because I'm creating a number of
    different reports, and it's simply a lot easier to use values like:

    $data{$key}{firstname}, $data{$key}{lastname}

    than it is to use values like

    $data[13456][2], $data[23543][3]

    On Oct 12, 1:03 pm, RedGrittyBrick <>
    wrote:

    > An SSCCE would help.


    I'm sorry, but I don't know this. What is an SSCCE?

    CC
    ccc31807, Oct 13, 2010
    #6
  7. ccc31807

    Dr.Ruud Guest

    On 2010-10-13 15:37, ccc31807 wrote:

    > I decided to continue to use $. as the hash key.


    If it smells like an array index ...


    > As it turns out, the
    > key isn't relevant to my application, as I'm not using the key to look
    > up the hash values. I'm just iterating through the hash, collecting
    > certain values, so the key is totally superfluous -- the only reason I
    > need a key is because of the nature of the hash.
    >
    > I don't want to use an array because I'm creating a number of
    > different reports, and it's simply a lot easier to use values like:
    >
    > $data{$key}{firstname}, $data{$key}{lastname}
    >
    > than it is to use values like
    >
    > $data[13456][2], $data[23543][3]


    That is not the proper comparison.

    $data[ $row ]{ firstname }

    $data[ $row ][ FIRSTNAME ]

    (assumes a numeric constant FIRSTNAME)


    > What is an SSCCE?


    JFGI

    --
    Ruud
    Dr.Ruud, Oct 13, 2010
    #7
  8. ccc31807 <> wrote:
    >I don't want to use an array because I'm creating a number of
    >different reports, and it's simply a lot easier to use values like:
    >
    >$data{$key}{firstname}, $data{$key}{lastname}
    >
    >than it is to use values like
    >
    >$data[13456][2], $data[23543][3]


    And why not use values like

    $data[$key]{firstname}, $data[$key]{lastname}

    jue
    Jürgen Exner, Oct 13, 2010
    #8
  9. ccc31807

    ccc31807 Guest

    On Oct 13, 11:08 am, Jürgen Exner <> wrote:
    > And why not use values like
    >
    >         $data[$key]{firstname}, $data[$key]{lastname}


    Because I wasn't completely truthful about my processing. I have to
    break the data apart on various values, some if which are unique keys,
    e.g., identification numbers for individual people. The data includes
    clients and counselors, and (obviously) clients can have multiple
    counselors and counselors can have multiple clients. Other values are
    one of a kind, such as a person's address, regardless of the number of
    times the particular person appears in the data. I have to cross
    reference these values by unique keys, and I use five hashes to sort
    out the data.

    I see now that I could use an array for the handful of data elements
    for each row that are unique.

    Thanks, CC.
    ccc31807, Oct 13, 2010
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Damon Getsman
    Replies:
    0
    Views:
    408
    Damon Getsman
    Jun 4, 2008
  2. M P
    Replies:
    1
    Views:
    456
  3. rp
    Replies:
    1
    Views:
    499
    red floyd
    Nov 10, 2011
  4. Une bévue
    Replies:
    5
    Views:
    141
    Une bévue
    Aug 10, 2006
  5. Antonio Quinonez
    Replies:
    2
    Views:
    156
    Antonio Quinonez
    Aug 14, 2003
Loading...

Share This Page