Perl HASH problem

  • Thread starter Shripathi Guruprasannaraj
  • Start date
S

Shripathi Guruprasannaraj

Hi,

The function am trying to write reads strings from a file. The file's
contents are numbers like 00101201. I am inserting them into a hashtable
removing the duplicates. I then write the keys into a file. The output
file contains a HASH(0x1169e4) string but the other keys are written
properly. I guess its actually a reference but I dont know how to
eliminate it. I am giving the code sample below.

foreach $line (@filecontents) {

($key,$value) = split(" ",$line);
if($key =~ m/^$year/) {
if (! defined($nodes{"$key"})) {
$nodes{"$key"} = 1;
}
}
}

@keys = keys %nodes;
foreach(@keys) {
print outputfile "\n$_";
}


Please give me some suggestions and also why its happening.

Thanks,

Raj
 
J

James Willmore

Hi,

The function am trying to write reads strings from a file. The
file's
contents are numbers like 00101201. I am inserting them into a
hashtable removing the duplicates. I then write the keys into a
file. The output file contains a HASH(0x1169e4) string but the other
keys are written properly. I guess its actually a reference but I
dont know how to eliminate it. I am giving the code sample below.

foreach $line (@filecontents) {

($key,$value) = split(" ",$line);
if($key =~ m/^$year/) {
if (! defined($nodes{"$key"})) {
$nodes{"$key"} = 1;
}
}
}

foreach my $line(@filecontents){
my($key, $value) = split(" ", $line);
$nodes{"$key"}++ if($key =~ /$year/);
}

This does two things:
1) matches the regular expression
2) increments the $key value of the nodes (since it appears you don't
care/want the $value).

You don't need 'm' for the regular expression usless you're going to
extract something from the match.

OTOH, if you *are* trying to do a match, then

$nodes{"$key"} = 1;

should be
$nodes{"$key"} = "$1";

HTH

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
I can't understand why a person will take a year or two to write
a novel when he can easily buy one for a few dollars. -- Fred
<Allen
 
B

Bob Walton

Shripathi Guruprasannaraj wrote:

....
The function am trying to write reads strings from a file. The file's
contents are numbers like 00101201. I am inserting them into a hashtable
removing the duplicates. I then write the keys into a file. The output
file contains a HASH(0x1169e4) string but the other keys are written
properly. I guess its actually a reference but I dont know how to
eliminate it. I am giving the code sample below.


I don't get a line with a stringified hash reference (the HASH(0x...)
above) when I run your code below. But, since you asked for suggestions...



You are missing:

use strict;
use warnings;

unless you just didn't show them. But then you'd need some "my" statements.

foreach $line (@filecontents) {

($key,$value) = split(" ",$line);


Given data like you say you have, split won't have anything to split on,
so the while line will end up in $key. So why the split? You're not
using it in place of chomp to remove the trailing newline, are you?

if($key =~ m/^$year/) {
if (! defined($nodes{"$key"})) {

------------^^^^^^^
---------------------------^----^
Here you probably want exists rather than defined. But why bother with
the test at all? The only thing you are setting the hash values to is
1, and if you set a key's value to 1 more than once, so what? Also, you
have a useless use of a double-quoted string, which is bad style.


Also, your indenting style, or lack thereof, makes it hard

to follow your if statements and loop.
$nodes{"$key"} = 1;

---------------^----^
Ditto on the useless use of a double-quoted string.

Also, note that your whole loop could be replaced with:

@nodes{@filecontents}=1;

since split() isn't doing anything.

}
}
}

@keys = keys %nodes;
foreach(@keys) {
print outputfile "\n$_";

------------------------^^^^
Are you sure you want the newline and your data in this order? You will
get an initial blank line. Also, it is widely accepted style to use
UPPERCASE for filehandles -- that avoids the potential to use reserved
words (possibly in future versions of Perl -- no one knows what they
will be, except they will not be uppercase) for filehandles. Also, you
could use:

print OUTPUTFILE join "\n",keys %nodes;

and made quicker work of it. If the hash were huge, though, you would want:

while(($keys,$values)=each(%nodes)){
print OUTPUTFILE "$keys\n";
}

although I guess it isn't, since you input your whole file into an array
in the first place.
 
S

Shripathi Guruprasannaraj

Hi,

I understand that my Perl code looks bad. I have been trying to work on
it for a while but since am mostly just processing text, I use Perl
fleetingly. Anyway I am attaching a sample input file. Actually I dint
have the double-quotes over the keys before. I thought it would make a
difference. It dint and I dint bother to change it before sending it
across. I am trying to get only unique keys into the hash table. The value
is just a dummy value. In the data file am sending, the first column is
the same. But in the actual file, it increases. The file is actually a
citation graph with the first column indicating the id of the paper that
cites a paper and the second column indicating the id of the paper its
citing to. I am trying to get the unique ids in the first column.


Raj
 
S

Shripathi Guruprasannaraj

Shripathi said:
Hi,

The function am trying to write reads strings from a file. The file's
contents are numbers like 00101201. I am inserting them into a hashtable
removing the duplicates. I then write the keys into a file. The output
file contains a HASH(0x1169e4) string but the other keys are written
properly. I guess its actually a reference but I dont know how to
eliminate it. I am giving the code sample below.

foreach $line (@filecontents) {

($key,$value) = split(" ",$line);
if($key =~ m/^$year/) {
if (! defined($nodes{"$key"})) {
$nodes{"$key"} = 1;
}
}
}

@keys = keys %nodes;
foreach(@keys) {
print outputfile "\n$_";
}


Please give me some suggestions and also why its happening.

Thanks,

Raj


Following Some of your suggestions , I made the changes to my code. Let
me know if I can improvise further.

#!/usr/bin/perl

use strict;
use warnings;


my $year = shift @ARGV;
my $file = shift @ARGV;
getNodes($year,$file);

sub getNodes() {

my $year = shift;
my $file = shift;

open(FILEHANDLE,"citationgraph") or die "Cannot open file $!\n";
open(OUTPUTFILE, ">$file") or die "Cannot open file $!\n";

my @filecontents = <FILEHANDLE>;
my %nodes = {};

foreach my $line (@filecontents) {

my ($key,$value) = split(" ",$line);
if($key =~ m/^$year/) {
$key =~ s/\s+$//;
if (! defined($nodes{$key})) {
$nodes{$key} = 1;
}
}
}

my @keys = keys %nodes;
foreach(@keys) {
print OUTPUTFILE "\n$_";
}

}
 
B

Bob Walton

Shripathi said:
I forgot to attach the sample input file

Looks like you forget to attach it again. You might have better luck if
you just copy/pasted it in -- hopefully just a few representative lines
of it. And while you're at it, put it with your program so the whole
thing can be copy/pasted/executed by anyone, something along the lines of:

#!/usr/bin/perl
use strict;
use warnings;
my @data=<DATA>;
chomp @data;
#do stuff with @data
print join "\n",@data;
__END__
data line1
data line2
data line3

and make sure the combo demonstrates the problem you are having when one
copy/paste/executes it. Thanks!
 
B

Ben Morrow

James Willmore said:
You don't need 'm' for the regular expression usless you're going to
extract something from the match.

OTOH, if you *are* trying to do a match, then

$nodes{"$key"} = 1;

should be
$nodes{"$key"} = "$1";

What?! This is complete nonsense.

You may omit the 'm' on a m// match if and only if the delimiter you
use is '/'; i.e. 'm|\d|' needs the 'm' but 'm/\d/' could have been
simply written '/\d/'. There is otherwise no significance to it.

$1 refers to what the first bracketed subexpression of the last
*successful* pattern match matched. So

$_ = "ab";
/(a)/; # Now $1 eq 'a'.
/(c)/; # $1 eq 'a' still, as the pattern didn't match.
/b/; # $1 is undef, as the pattern matched but didn't have any
# capturing parentheses.

Please don't talk rubbish.

Ben
 
B

Bob Walton

Shripathi said:
Shripathi Guruprasannaraj wrote:
....

Following Some of your suggestions , I made the changes to my code. Let
me know if I can improvise further.

#!/usr/bin/perl

use strict;
use warnings;


my $year = shift @ARGV;
my $file = shift @ARGV;
getNodes($year,$file);

sub getNodes() {

--------------^^
Are you sure this is the prototype you want for you sub? It doesn't
seem to go along with the actual usage you are making. Or perhaps you
inadvertently places the parens there, not realizing you were defining a
prototype for you sub by so doing?

my $year = shift;
my $file = shift;

open(FILEHANDLE,"citationgraph") or die "Cannot open file $!\n";
open(OUTPUTFILE, ">$file") or die "Cannot open file $!\n";

my @filecontents = <FILEHANDLE>;
my %nodes = {};

-------^^^^^^^^^^^
AHA! This is where your stringified hash reference is coming from. The
RHS of the above defines a scalar value which is a reference to an empty
anonymous hash. The LHS is a hash, so the assignment operator is
expecting a list with an even number of elements. Perl attempts to
DWYM, and so sets up a key in %nodes which is the stringified hash
reference, and an undef value. Which, of course, is not WYM.

But the

use warnings;

above caused Perl to mention that. Did you not pay attention? If you
didn't understand the warning, add:

use diagnostics;

to your program, which will generate a paragraph of explanation for each
warning you receive.

What should you do? Either:

my %nodes;

or

my %nodes=();
 
B

Ben Morrow

Following Some of your suggestions , I made the changes to my code. Let
me know if I can improvise further.

I think you mean 'improve'... :)[1]
#!/usr/bin/perl

use strict;
use warnings;

my $year = shift @ARGV;
my $file = shift @ARGV;

It is usual to omit the @ARGV here, as Perl will assume it.
getNodes($year,$file);

If you declare the sub at the top with
use subs qw/getNodes/;
then you can call it without brackets, viz:
getNodes $year, $file;
which is cleaner.
sub getNodes() {

You do not want the () there. They do not mean what you think they
mean.
my $year = shift;
my $file = shift;

open(FILEHANDLE,"citationgraph") or die "Cannot open file $!\n";

You checked the return of open(): good :).

You would be better off with a more descriptive name than FILEHADLE.

You should also not append "\n" to die() messages, in general, as it
prevents Perl from telling you where the error occurred.

You would also be better off using a 'my' variable for the filehandle,
viz:

open my $GRAPH, "< citationgraph"
or die "Cannot open 'citationgraph': $!";
open(OUTPUTFILE, ">$file") or die "Cannot open file $!\n";

my @filecontents = <FILEHANDLE>;
my %nodes = {};

foreach my $line (@filecontents) {

my ($key,$value) = split(" ",$line);
if($key =~ m/^$year/) {
$key =~ s/\s+$//;

The split on " " will remove all whitespace for you.
if (! defined($nodes{$key})) {

As someone else pointed out, you do not need this test.
$nodes{$key} = 1;
}
}
}

You still need to sort out your indenting. Get an editor which does it
for you (see perldoc -q editor).
my @keys = keys %nodes;
foreach(@keys) {
print OUTPUTFILE "\n$_";

You can get rid of the need for "\n" by putting
$\ = "\n";
at the top; this whole loop can then be written
print for keys %nodes;
which is rather clearer.

Ben

[1] In case anyone should get the wrong idea: I in no way wish to
criticise people with poor English, and I appreciate the effort made
to write a language which is not your own. I seek only to help you
make yourself understood better :).
 
S

Shripathi Guruprasannaraj

Hi Bob,

Thanks a lot for helping me out. There is a blank line as the first line
of my output file. Do you know whats causing that ?

Raj
 
J

James Willmore

What?! This is complete nonsense.

You may omit the 'm' on a m// match if and only if the delimiter you
use is '/'; i.e. 'm|\d|' needs the 'm' but 'm/\d/' could have been
simply written '/\d/'. There is otherwise no significance to it.

$1 refers to what the first bracketed subexpression of the last
*successful* pattern match matched. So

$_ = "ab";
/(a)/; # Now $1 eq 'a'.
/(c)/; # $1 eq 'a' still, as the pattern didn't match.
/b/; # $1 is undef, as the pattern matched but didn't have any
# capturing parentheses.

Yes. You have stated much better than I what I was trying to say.

Thank you.

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
Veni, Vidi, Visa.
 
S

Shripathi Guruprasannaraj

Ben said:
Following Some of your suggestions , I made the changes to my code. Let
me know if I can improvise further.


I think you mean 'improve'... :)[1]

=> No I meant improvise.
#!/usr/bin/perl

use strict;
use warnings;

my $year = shift @ARGV;
my $file = shift @ARGV;


It is usual to omit the @ARGV here, as Perl will assume it.

getNodes($year,$file);


If you declare the sub at the top with
use subs qw/getNodes/;
then you can call it without brackets, viz:
getNodes $year, $file;
which is cleaner.

sub getNodes() {


You do not want the () there. They do not mean what you think they
mean.

my $year = shift;
my $file = shift;

open(FILEHANDLE,"citationgraph") or die "Cannot open file $!\n";


You checked the return of open(): good :).

You would be better off with a more descriptive name than FILEHADLE.

You should also not append "\n" to die() messages, in general, as it
prevents Perl from telling you where the error occurred.

You would also be better off using a 'my' variable for the filehandle,
viz:

open my $GRAPH, "< citationgraph"
or die "Cannot open 'citationgraph': $!";

open(OUTPUTFILE, ">$file") or die "Cannot open file $!\n";

my @filecontents = <FILEHANDLE>;
my %nodes = {};

foreach my $line (@filecontents) {

my ($key,$value) = split(" ",$line);
if($key =~ m/^$year/) {
$key =~ s/\s+$//;


The split on " " will remove all whitespace for you.

if (! defined($nodes{$key})) {


As someone else pointed out, you do not need this test.

$nodes{$key} = 1;
}
}
}


You still need to sort out your indenting. Get an editor which does it
for you (see perldoc -q editor).

my @keys = keys %nodes;
foreach(@keys) {
print OUTPUTFILE "\n$_";


You can get rid of the need for "\n" by putting
$\ = "\n";
at the top; this whole loop can then be written
print for keys %nodes;
which is rather clearer.



Ben

[1] In case anyone should get the wrong idea: I in no way wish to
criticise people with poor English, and I appreciate the effort made
to write a language which is not your own. I seek only to help you
make yourself understood better :).
 
B

Ben Morrow

Ben said:
Following Some of your suggestions , I made the changes to my code. Let
me know if I can improvise further.

I think you mean 'improve'... :)[1]

=> No I meant improvise.

Please explain.

AFAIK, 'to improve' means 'to make better', which makes sense; 'to
improvise' means 'to play music which you are making up as you go
along', or, from this, 'to make the best you can of a situation where
there are no rules to follow', which doesn't.
[1] In case anyone should get the wrong idea: I in no way wish to
criticise people with poor English, and I appreciate the effort made
to write a language which is not your own. I seek only to help you
make yourself understood better :).

Ben
 
S

Shripathi Guruprasannaraj

Ben said:
Ben said:
(e-mail address removed) wrote:


Following Some of your suggestions , I made the changes to my code. Let
me know if I can improvise further.

I think you mean 'improve'... :)[1]

=> No I meant improvise.


Please explain.

AFAIK, 'to improve' means 'to make better', which makes sense; 'to
improvise' means 'to play music which you are making up as you go
along', or, from this, 'to make the best you can of a situation where
there are no rules to follow', which doesn't.

=>'to make the best you can of a situation where
there are no rules to follow'

This is a perfect situation in which there are no hard and fast rules
to follow. By improvise I meant something like writing /\d/ instead of
m/\d/ . I know this is not a perfect example but I hope you get what I
am trying to put across.

I could have also said 'improve' which probably fits in better than
improvise.


[1] In case anyone should get the wrong idea: I in no way wish to
criticise people with poor English, and I appreciate the effort made
to write a language which is not your own. I seek only to help you
make yourself understood better :).
=> I definitely understand that you are helping me out. But FYI, English
has been my medium of instruction
since I was 2. I know to read, write , speak English while I can only
speak my mother-tongue (native language).
 
B

Bob Walton

Shripathi said:
Hi Bob,

Thanks a lot for helping me out. There is a blank line as the first line
of my output file. Do you know whats causing that ?

Raj ....

Yes. As I mentioned in my first post, that is because the first thing
you print is a newline -- hence the first line is blank. You probably want:

print OUTPUTFILE "$_\n";
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top