Help Needed Building Array Of Hashes From CSV

T

Tim Sheets

Hello group,

I am rather new to Perl (and programming in general) and am having a
problem building a hash. I have read several tutorials, examples, etc..
but nothing touches on exactly what I am trying to do.

I am trying to read data in from a CSV file (Excel export) and store the
information in an array of hashes. The first line of the file contains
field names, and the following lines contain data.

So, I am reading the first line into an array, splitting on the commas,
to use as the hash keys. Then all subsequent lines are read into
another array, splitting on the commas to be used as the values.

Then I am trying to build another array, and combine the 'keys' and
'values' from the first two arrays as a hash.

Here is the code I am trying to use:

# Begin reading csv dump
open (REFERENCEFILE,"$ARGV[1]") ;

$count=0;

while (<REFERENCEFILE>) {
if ($count == 0) {
@refdatakeys = split(/,/, $_);
$numelements = scalar (@refdatakeys);
} else {
@refdatavalues = split(/,/, $_);
for ($i=0; $i < $numelements; $i++) {
@arrefdata = ({$refdatakeys[$i] => '$refdatavalues[$i]'});
}
}

print ("SITE ID is: $arrefdata[$count]{'SITE ID'}\n");
$count++;
}

If I put print statements to test the values in @refdatakeys and
@refdatavalues, everything is stored in those two arrays. But, I can't
figure out what I am doing wrong when trying to store those values as
key/value pairs in @arrefdata. Or, I guess it's possible that values
are stored, but I am not correctly accessing that data in my print
statement.

If anyone can figure out what I am doing wrong, I sure would appreciate
some help.

Thanks all!!

Tim
 
J

Jim Cochrane

Hello group,

I am rather new to Perl (and programming in general) and am having a
problem building a hash. I have read several tutorials, examples, etc..
but nothing touches on exactly what I am trying to do.

I am trying to read data in from a CSV file (Excel export) and store the
information in an array of hashes. The first line of the file contains
field names, and the following lines contain data.

So, I am reading the first line into an array, splitting on the commas,
to use as the hash keys. Then all subsequent lines are read into
another array, splitting on the commas to be used as the values.

Then I am trying to build another array, and combine the 'keys' and
'values' from the first two arrays as a hash.

I could read your code and guess, but instead, it might save time
to first ask: What exactly are you trying to do? You're describing
the problem in terms of an implementation, but I don't see any precise
problem description mentioned or implied from the above description (maybe
imprecise, but that's not good enough for a computer). In other words,
what are the requirements for your problem? (If you decide to spend
the time to become good at programming, you'll learn that being able to
state the problem without mentioning or implying an implementation is
one important skill you will need.)

Once your requirements are clear, people can then help you find a solution -
an implementation.
Here is the code I am trying to use:

# Begin reading csv dump
open (REFERENCEFILE,"$ARGV[1]") ;

$count=0;

while (<REFERENCEFILE>) {
if ($count == 0) {
@refdatakeys = split(/,/, $_);
$numelements = scalar (@refdatakeys);
} else {
@refdatavalues = split(/,/, $_);
for ($i=0; $i < $numelements; $i++) {
@arrefdata = ({$refdatakeys[$i] => '$refdatavalues[$i]'});
}
}

print ("SITE ID is: $arrefdata[$count]{'SITE ID'}\n");
$count++;
}

If I put print statements to test the values in @refdatakeys and
@refdatavalues, everything is stored in those two arrays. But, I can't
figure out what I am doing wrong when trying to store those values as
key/value pairs in @arrefdata. Or, I guess it's possible that values
are stored, but I am not correctly accessing that data in my print
statement.

If anyone can figure out what I am doing wrong, I sure would appreciate
some help.

Thanks all!!

Tim
 
G

George Kinley

Not asking to piss you off but just curious
Why did you thought of using Hash?
 
T

Tore Aursand

open (REFERENCEFILE,"$ARGV[1]") ;

Always check the outcome of an 'open()', and drop those unneeded double
quotes;

open( REFERENCEFILE, '<', $ARGV[1] ) or die "$!\n";
$count=0;

You're not using 'strict' (and possible not 'warnings' either). Make sure
your script starts with (something like):

#!/usr/bin/perl
#
use strict;
use warnings;

When using 'strict', you need to declare your variables for the current
scope before you use them. Example;

my $count = 0;
while (<REFERENCEFILE>) {
if ($count == 0) {
@refdatakeys = split(/,/, $_);
$numelements = scalar (@refdatakeys);
} else {
@refdatavalues = split(/,/, $_);
for ($i=0; $i < $numelements; $i++) {
@arrefdata = ({$refdatakeys[$i] => '$refdatavalues[$i]'});
}
}
print ("SITE ID is: $arrefdata[$count]{'SITE ID'}\n");
$count++;
}

Instead of using your own routine to parse CSV, you should consider one of
the CSV modules on CPAN - <http://www.cpan.org/> - or Text::parseWords
(which comes with Perl).

If you want to study a regular expression example which does the job in a
major of the cases, see 'perldoc -q split'.

No need to use the 'scalar()' function. Using an array in a scalar
context will return its "length";

my $length = @array;

Consider comparing the "length" of @refdatakeys and @refdatavalues in the
code above, and consider doing something "smart" when there's a "length"
mismatch.
 
J

Joe Smith

Tim said:
for ($i=0; $i < $numelements; $i++) {
@arrefdata = ({$refdatakeys[$i] => '$refdatavalues[$i]'});
}

I expect that you meant
$arrefdata[$count] = { key1,val1, key2,val2, key3,val3, ...}

You ought to create dummy column labels if any row has more items than
the first row of the file. Use map{}() to build a list of key/value pairs.

my @temp = map { ( ($_ < @refdatakeys ? $refdatakeys[$_] : "unknown $_"),
$refdatavalues[$_] ) } foreach (0 .. $#refdatavalues);
$arrefdata[$count] = { @temp };

-Joe
 
A

Anno Siegel

Tim Sheets said:
Hello group,

I am rather new to Perl (and programming in general) and am having a
problem building a hash. I have read several tutorials, examples, etc..
but nothing touches on exactly what I am trying to do.

I am trying to read data in from a CSV file (Excel export) and store the
information in an array of hashes. The first line of the file contains
field names, and the following lines contain data.

So, I am reading the first line into an array, splitting on the commas,
to use as the hash keys. Then all subsequent lines are read into
another array, splitting on the commas to be used as the values.

As has been noted, there are modules on CPAN that deal with CSV.
Then I am trying to build another array, and combine the 'keys' and
'values' from the first two arrays as a hash.

Here is the code I am trying to use:

# Begin reading csv dump
open (REFERENCEFILE,"$ARGV[1]") ;

The quotes around "@ARGV[1]" do nothing useful. They shouldn't be there.
$count=0;

You're not running under "strict", are you? You should, and switch
warnings on too.
while (<REFERENCEFILE>) {
if ($count == 0) {
@refdatakeys = split(/,/, $_);

The last key will have a linefeed tacked on. You ought to chomp
the line first.
$numelements = scalar (@refdatakeys);

It would be easier to read the first line by itself and not bother
with the line count.

"scalar" is redundant in the assignment of $numelements, it is already
in scalar context. But you don't need $numelements at all (see below).
} else {
@refdatavalues = split(/,/, $_);

Again, there will be a linefeed appended to the last value.
for ($i=0; $i < $numelements; $i++) {

You don't need an indexed loop if you build the individual hashes
correctly.
@arrefdata = ({$refdatakeys[$i] => '$refdatavalues[$i]'});

This assigns a single hashref (which contains a single key/value pair)
to @arrefdata. @arrefdata will never accumulate anything that way.
This is probably the central problem of your code.
}
}

print ("SITE ID is: $arrefdata[$count]{'SITE ID'}\n");
$count++;
}

So here is a way to do what you want:

my ( @refdatakeys, @arrefdata);

for ( scalar <REFERENCEFILE> ) {
chomp;
@refdatakeys = split /,/;
}

while ( <REFERENCEFILE> ) {
chomp;
my %row;
@row{ @refdatakeys} = split /,/;
push @arrefdata, \ %row;
}

Anno
 
T

Tim Sheets

Jim said:
I could read your code and guess, but instead, it might save time
to first ask: What exactly are you trying to do? You're describing
the problem in terms of an implementation, but I don't see any precise
problem description mentioned or implied from the above description (maybe
imprecise, but that's not good enough for a computer). In other words,
what are the requirements for your problem? (If you decide to spend
the time to become good at programming, you'll learn that being able to
state the problem without mentioning or implying an implementation is
one important skill you will need.)

Once your requirements are clear, people can then help you find a solution -
an implementation.

Jim,

You may have a point there regarding my ability to "properly" ask a
question, but reading the above text, I am completely confused. Maybe I
am just dense, but about all I can understand is you aren't happy with
the presentation of my question/problem.

I thought I did a pretty good job explaining what I was trying to do,
and what was working, and what wasn't working (the latter two being
after the code I posted). Maybe I was too verbose, I dunno.

Thanks anyway,

Tim
 
T

Tim Sheets

George said:
Not asking to piss you off but just curious
Why did you thought of using Hash?

It just seemed to make sense. Would be easier to reference the fields
of the CSV with names, rather than index numbers. If the CSV file
format changes, I think it would be easier to tweak the code if names
were used instead of the index numbers.

Maybe you have a better way in mind?? If so, I am open to suggestions.

Tim
 
T

Tim Sheets

Tore said:
open (REFERENCEFILE,"$ARGV[1]") ;


Always check the outcome of an 'open()', and drop those unneeded double
quotes;

open( REFERENCEFILE, '<', $ARGV[1] ) or die "$!\n";

You're correct. I should be doing this.
You're not using 'strict' (and possible not 'warnings' either). Make sure
your script starts with (something like):

#!/usr/bin/perl
#
use strict;
use warnings;

right again. :)


Instead of using your own routine to parse CSV, you should consider one of
the CSV modules on CPAN - <http://www.cpan.org/> - or Text::parseWords
(which comes with Perl).

I thought of that, but decided to try to make it happen myself. More
portable if you don't have to install an additional module, and as much
as anything, I have never been able to get an array of hashes to work
beyond tutorial examples I have found, and just wanted to tackle it
again. One of these days I will figure it out.

Consider comparing the "length" of @refdatakeys and @refdatavalues in the
code above, and consider doing something "smart" when there's a "length"
mismatch.

hmmmm....good point. I think in my particular case, it won't matter, if
I have an empty field, I want to keep it that way, but I can see where
this wouldn't always be the case.

Tim
 
T

Tim Sheets

Joe said:
Tim said:
for ($i=0; $i < $numelements; $i++) {
@arrefdata = ({$refdatakeys[$i] => '$refdatavalues[$i]'});
}


I expect that you meant
$arrefdata[$count] = { key1,val1, key2,val2, key3,val3, ...}

You ought to create dummy column labels if any row has more items than
the first row of the file. Use map{}() to build a list of key/value pairs.

my @temp = map { ( ($_ < @refdatakeys ? $refdatakeys[$_] : "unknown
$_"),
$refdatavalues[$_] ) } foreach (0 .. $#refdatavalues);
$arrefdata[$count] = { @temp };

-Joe

Thanks, Joe, I'll look at that. Could be very handy...

Tim
 
T

Tore Aursand

I thought of that, but decided to try to make it happen myself.

That is good if you intend to learn, but never try to reinvent the wheel
when you're done learning.
More portable if you don't have to install an additional module [...]

Text::parseWords comes with the standard Perl distribution, and what's the
problem with installing your own modules? You don't have to be superuser
to be able to do that.
 
J

Jim Cochrane

Jim said:
Jim,

You may have a point there regarding my ability to "properly" ask a
question, but reading the above text, I am completely confused. Maybe I
am just dense, but about all I can understand is you aren't happy with
the presentation of my question/problem.

I thought I did a pretty good job explaining what I was trying to do,
and what was working, and what wasn't working (the latter two being
after the code I posted). Maybe I was too verbose, I dunno.

Sorry, Tim - I was probably being a little overly didactic, but I was
trying to point out that you didn't really say why you are trying to build
an array of hashes - that is, what you were going to do with the array
once you built it. In other words, often if you can specify a problem
from what the program should do from the perspective of a user of the
program who has no idea of what the code looks like, you can be clear
as to what the problem actually is, both in terms of your own thinking
and for presenting it to others. Often this helps not only in finding
a correct implementation, but in finding a better one (e.g., more efficient,
easier to maintain, or etc.)

I do maintain that to become truly good at programming, it's necessary
to be able to state a problem without mentioning or implying an
implementation (e.g., without talking about hashes, arrays, etc.).
However, becoming a professional programmer may not be your goal, which is
why I was perhaps being overly didactic.
 
E

Eric Bohlman

You may have a point there regarding my ability to "properly" ask a
question, but reading the above text, I am completely confused. Maybe
I am just dense, but about all I can understand is you aren't happy
with the presentation of my question/problem.

I thought I did a pretty good job explaining what I was trying to do,
and what was working, and what wasn't working (the latter two being
after the code I posted). Maybe I was too verbose, I dunno.

Jim was describing what's commonly known in technical groups as an "XY
problem." Someone has a task (goal, end) that they want to accomplish.
That's the "X." They've also decided that they're going to accomplish the
task in a particular fashion (implementation, means). That's the "Y." The
problem occurs when that person comes to the group and describes Y without
describing X. It's a problem because in many cases there's a much better
(often much simpler) way to accomplish X than Y. And it's also a problem
because without knowledge of X, people have a hard time following your Y.
You won't notice this yourself, because you can take X for granted, but
what seems obvious to you won't be at all obvious to your readers because
they have no context.

So what really happened here was that you did a pretty good job of
explaining the details of how you were going about trying to solve an
unspecified problem. That's simply not enough for most people to work
with. What Jim wants you to do (and I agree that it's a very important
skill, albeit one that doesn't come easily; you really do have to work hard
at developing it) is learn to describe your goal (i.e. the true problem
you're trying to solve) in its own terms, rather than in terms of the way
you've tried to solve it. In other words, say *what* you're trying to do
as well as *how* you're trying to do it.

This isn't only important when you're asking other people for help; it's
also important when you're working by yourself. If you start to dive into
the implementation details before you've got a clear picture of the problem
to solve (and believe me, it's *very* tempting to do so), you'll very
easily find yourself on the wrong track, with the likely result that you'll
spend a long time writing and debugging code that you'll eventually have to
scrap and redo. And even worse, at one point in the process you may wind
up with a program that *seems* to work, to the point where you start
relying on it, but winds up failing (often silently) for a few important
special cases.
 
T

Tim Sheets

Anno said:
You're not running under "strict", are you? You should, and switch
warnings on too.

Wasn't, but am now. :)
The last key will have a linefeed tacked on. You ought to chomp
the line first.

You got that right!! Actually, I was doing that in another part of the
script where I am massaging the fields to meet some formatting requirements.

@arrefdata = ({$refdatakeys[$i] => '$refdatavalues[$i]'});


This assigns a single hashref (which contains a single key/value pair)
to @arrefdata. @arrefdata will never accumulate anything that way.
This is probably the central problem of your code.

Yes, it was.

So here is a way to do what you want:

my ( @refdatakeys, @arrefdata);

for ( scalar <REFERENCEFILE> ) {
chomp;
@refdatakeys = split /,/;
}

while ( <REFERENCEFILE> ) {
chomp;
my %row;
@row{ @refdatakeys} = split /,/;
push @arrefdata, \ %row;
}

Works great!! I followed your suggestions and dropped this in place
which got me past this hurdle.

I finally got everything finished up today....it may not be pretty, but
it works. :)

Thanks for your help!!!!

Tim
 
T

Tim Sheets

Jim Cochrane wrote:

Sorry, Tim - I was probably being a little overly didactic, but I was
trying to point out that you didn't really say why you are trying to build
an array of hashes - that is, what you were going to do with the array
once you built it. In other words, often if you can specify a problem
from what the program should do from the perspective of a user of the
program who has no idea of what the code looks like, you can be clear
as to what the problem actually is, both in terms of your own thinking
and for presenting it to others. Often this helps not only in finding
a correct implementation, but in finding a better one (e.g., more efficient,
easier to maintain, or etc.)

Thanks for the clarification, Jim. I will keep this in mind in future
posts.

I do maintain that to become truly good at programming, it's necessary
to be able to state a problem without mentioning or implying an
implementation (e.g., without talking about hashes, arrays, etc.).
However, becoming a professional programmer may not be your goal, which is
why I was perhaps being overly didactic.

While I would like to be a great programmer, it isn't my primary
function at work, and don't really want to be a programmer by
profession. The way I tell it is "I want to know enough programming to
be able to do what I want to do". :) Of course that may change from
day to day... So, I end up writing a little script to accomplish a
specific task about once every 3 or 4 months.

Anyway, thanks again for the pointers on posting questions from the
perspective of the problem, and what I am trying to accomplish, as
opposed to an implementation perspective.

Tim
 
J

Jim Cochrane

Jim Cochrane wrote:



Thanks for the clarification, Jim. I will keep this in mind in future
posts.

You're welcome. I think Eric Bohlman did a good job of clarifying the
issue further in his post. You might want to read it if you haven't
already.
While I would like to be a great programmer, it isn't my primary
function at work, and don't really want to be a programmer by
profession. The way I tell it is "I want to know enough programming to
be able to do what I want to do". :) Of course that may change from
day to day... So, I end up writing a little script to accomplish a
specific task about once every 3 or 4 months.

I understand. And that's a very reasonable goal.
Anyway, thanks again for the pointers on posting questions from the
perspective of the problem, and what I am trying to accomplish, as
opposed to an implementation perspective.

IMO, if you put some effort into learning better how to do this, it will
give you a useful edge.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top