convert integer to string

C

ccc31807

This is embarrassing!

I read an input file, split each row on the separators, pick out the
unique key (which consists of seven digits), create a hash element on
the key, and read the other values into an anonymous hash. This has
worked for years without a hitch.

Yesterday, my user requested the app to print a calculated field,
which is a list of names contained in a secondary file. I thought, "No
problem, I'll just create a new element in my anonymous hash,
concatenate each name unless the name already matched the string, and
print it out." Didn't work, but the errors seemed to be random and
arbitrary.

After a couple of hours, I used bdf's trick to print out the hash, and
discovered much to my embarrassment that the keys weren't the same. In
the main loop where I created the hash, the keys were as expected and
consisted of seven characters which are all digits. HOWEVER, in the
loop where I created the additional hash element, the keys with
leading zeros did not match. Here is an example of the hash:

0123456 => HASH(deadbeed)
id => 0123456
name => joe
gender => male
new_field =>
123456 => HASH(deadbeef)
new_field => list of concatenated strings

I read the main file and create the main hash on the unique key, one
element of which is 'new_field' with an initial value of ' '. Then, I
read a secondary file which contains the key and ATTEMPT to
concatenate an element to the main hash element new_field. Instead, it
creates a new hash element with the numeric value of the unique key
instead of the string value.

So, I guess my question is how I convert an integer to a string to
preserve the leading zeros.

According to MJD on his infrequently asked questions page, the answer
is, "Try using a whip." But that doesn't tell me what kind of whip.
http://perl.plover.com/IAQ/IAQlist.html#how_can_i_force_perl_to_treat_a_number_as_a_string

CC.
 
R

Ralph Malph

So, I guess my question is how I convert an integer to a string to
preserve the leading zeros.

According to MJD on his infrequently asked questions page, the answer
is, "Try using a whip." But that doesn't tell me what kind of whip.
http://perl.plover.com/IAQ/IAQlist.html#how_can_i_force_perl_to_treat_a_number_as_a_string
The whip I usually use is string concatenation or variable interpolation
within double quotes.
Something like

my $number=5;
my $string="000".$number;

Surely there are several ways to do this.
This always works for me! :)
 
U

Uri Guttman

c> After a couple of hours, I used bdf's trick to print out the hash, and
c> discovered much to my embarrassment that the keys weren't the same. In
c> the main loop where I created the hash, the keys were as expected and
c> consisted of seven characters which are all digits. HOWEVER, in the
c> loop where I created the additional hash element, the keys with
c> leading zeros did not match. Here is an example of the hash:

what trick? use Data::Dumper and no trick is needed.

c> 0123456 => HASH(deadbeed)
c> id => 0123456

that is an OCTAL literal. when parsed into perl it will be converted to
an integer and later printed in decimal. if those are always keys which
means strings, never print or use them without quotes. don't think of
them as numbers but strings with all digits.

c> So, I guess my question is how I convert an integer to a string to
c> preserve the leading zeros.

if you already have corrupted numbers, use sprintf to pad them with
zeros. but i suspect you will still have nasty errors as you will have
converted from the literal octal value now to a decimal. you can sprintf
the number back in octal to compensate. the proper solution is to never
let perl see those as literal octal numbers but always as strings. i
dunno what your code is doing (as you didn't post any) to make this
happen. are you doing an eval on some incoming data? that is a no-no! do
a proper parse and you can keep those keys as strings.

uri
 
C

ccc31807

dunno what your code is doing (as you didn't post any) to make this
happen. are you doing an eval on some incoming data? that is a no-no! do
a proper parse and you can keep those keys as strings.

This is what I'm doing.

The main file looks like this:
0123456|joe|male|etc ...

which I manipulate as follows:
my ($id, $name, $gender, @rest) = split /\|/;
$main_hash{$id} = {
id => $id,
name => $name.
gender => $gender,
rest => @rest,
new_value => ' ',
};

The secondary file looks like this:
0123456|this|etc ...
0123456|is|etc ...
0123456|a|etc ...
0123456|list|etc ...
0123456|of|etc ...
0123456|strings|etc ...

which I manipulate as follows:
my ($id, $string, @rest) = split /\|/;
$main_hash{$id}{new_value} .= $string unless $main_hash{$id}
{new_value} =~ /$string/; #the strings can be duplicated but I only
want one of each

So, should I double quote the $id when I use it as a hash key? That
strikes me as idiosyncratic even for Perl.

CC.
 
J

J. Gleixner

ccc31807 said:
This is what I'm doing.

The main file looks like this:
0123456|joe|male|etc ...

which I manipulate as follows:
my ($id, $name, $gender, @rest) = split /\|/;
$main_hash{$id} = {
id => $id,
name => $name.
gender => $gender,
rest => @rest,
new_value => ' ',
};

The secondary file looks like this:
0123456|this|etc ...
0123456|is|etc ...
0123456|a|etc ...
0123456|list|etc ...
0123456|of|etc ...
0123456|strings|etc ...

which I manipulate as follows:
my ($id, $string, @rest) = split /\|/;
$main_hash{$id}{new_value} .= $string unless $main_hash{$id}
{new_value} =~ /$string/; #the strings can be duplicated but I only
want one of each

So, should I double quote the $id when I use it as a hash key? That
strikes me as idiosyncratic even for Perl.

CC.

Provide actual code that we can run that shows your issue and so we can
see what's happening.
 
J

J. Gleixner

ccc31807 said:
This is embarrassing!

I read an input file, split each row on the separators, pick out the
unique key (which consists of seven digits), create a hash element on
the key, and read the other values into an anonymous hash. This has
worked for years without a hitch.

Yesterday, my user requested the app to print a calculated field,
which is a list of names contained in a secondary file. I thought, "No
problem, I'll just create a new element in my anonymous hash,
concatenate each name unless the name already matched the string, and
print it out." Didn't work, but the errors seemed to be random and
arbitrary.

After a couple of hours, I used bdf's trick to print out the hash, and
discovered much to my embarrassment that the keys weren't the same. In
the main loop where I created the hash, the keys were as expected and
consisted of seven characters which are all digits. HOWEVER, in the
loop where I created the additional hash element, the keys with
leading zeros did not match. Here is an example of the hash:

Most folks really don't need to know the backstory. Post your code,
your results, your expectations, and your questions.
 
C

ccc31807

Provide actual code that we can run that shows your issue and so we can
see what's happening.

Here is the working code in my test script. Unfortunately, I can't
post the data files. Note that this code does some other things and
contains debugging statements.

open FAC, '<', "FAC_${term}.csv", or die "Cannot open FAC, $!";
chomp ($header = <FAC>);
while (<FAC>)
{
next unless /\w/;
chomp;
my ($changed, $last, $first, $middle, $id2, $region, $contract,
$addy1, $addy2, $csz, $mail, $trs, @courses) = parse_line(',', $bool,
$_);
print "id2 is [$id2] and facid is [$facid]\n";
if ($facid !~ /\?/) { next unless $id2 eq $facid; }
$fac{$id2} = {
id2 => $id2,
contract => $contract,
first => $first,
middle => $middle,
last => $last,
addy1 => $addy1,
addy2 => $addy2,
csz => $csz,
mail => $mail,
trs => $trs,
courses => @courses,
xlist => '',
}
}
close FAC;

open SEC, '<', "$sections_file", or die "Cannot open SEC, $!";
chomp ($header = <SEC>);
while (<SEC>)
{
next unless /\w/;
chomp;
s/'/\\'/g;
my ($last, $first, $middle, $id1, $filename, $crs_id, $site, $loc,
$glcode, $level, $count, $status, $section, $title, $hours, $xlist,
$total, $travel, $contract) = parse_line(',', 0, $_);
next if $contract =~ /N/;
$sec{$crs_id} = {
crs_id => $crs_id,
filename => $filename,
id1 => $id1,
site => $site,
loc => $loc,
glcode => $glcode,
level => $level,
count => $count,
status => $status,
section => $section,
title => $title,
hours => $hours,
xlist => $xlist,
total => $total,
travel => $travel,
contract => $contract,
};
$xlist{$id1} .= "$section " if $xlist =~ /\d/ and $xlist !~ /
$section/;
$fac{$id1}{xlist} .= "$section " if $xlist =~ /\d/;
}
close SEC;
 
U

Uri Guttman

c> This is what I'm doing.

c> The main file looks like this:
c> 0123456|joe|male|etc ...

c> which I manipulate as follows:
c> my ($id, $name, $gender, @rest) = split /\|/;
c> $main_hash{$id} = {
c> id => $id,
c> name => $name.
c> gender => $gender,
c> rest => @rest,

that is wrong as it will put the whole array there. you need a ref to
that array or an anon ref.

c> new_value => ' ',
c> };

c> The secondary file looks like this:
c> 0123456|this|etc ...
c> 0123456|is|etc ...
c> 0123456|a|etc ...
c> 0123456|list|etc ...
c> 0123456|of|etc ...
c> 0123456|strings|etc ...

c> which I manipulate as follows:
c> my ($id, $string, @rest) = split /\|/;
c> $main_hash{$id}{new_value} .= $string unless $main_hash{$id}
c> {new_value} =~ /$string/; #the strings can be duplicated but I only
c> want one of each

c> So, should I double quote the $id when I use it as a hash key? That
c> strikes me as idiosyncratic even for Perl.

in that limited code i don't see where the keys would be interpreted as
literal numbers. there must be something else going on which is doing
that. perl won't lose leading zeroes in strings without doing some
number conversions. are you sure you never look at those kays as
numbers? like use == to check them or similar? since they seem to be
fixed size you can always use the string comparison ops safely.

uri
 
U

Uri Guttman

c> Here is the working code in my test script. Unfortunately, I can't
c> post the data files. Note that this code does some other things and
c> contains debugging statements.

c> open FAC, '<', "FAC_${term}.csv", or die "Cannot open FAC, $!";
c> chomp ($header = <FAC>);
c> while (<FAC>)
c> {
c> next unless /\w/;
c> chomp;
c> my ($changed, $last, $first, $middle, $id2, $region, $contract,
c> $addy1, $addy2, $csz, $mail, $trs, @courses) = parse_line(',', $bool,
c> $_);
c> print "id2 is [$id2] and facid is [$facid]\n";
c> if ($facid !~ /\?/) { next unless $id2 eq $facid; }
c> $fac{$id2} = {
c> id2 => $id2,
c> contract => $contract,
c> first => $first,
c> middle => $middle,
c> last => $last,
c> addy1 => $addy1,
c> addy2 => $addy2,
c> csz => $csz,
c> mail => $mail,
c> trs => $trs,
c> courses => @courses,

same bug as i pointed out in another post. you need a ref or anon array
there. that is very wrong. who knows what it is doing to your app?

c> xlist => '',
c> }

uri
 
C

ccc31807

'%main_hash' is an appallingly bad name for a variable. Why is it there?
What's it got in it? (In this case, probably, something like '%people'
might be better.)


Sorry. I posted the code where I populated the two hashes, named %fac
and %sec.
You want \Q\E here.

Trying that now.
So you must be doing something else.

Not intentionally.

CC.
 
C

ccc31807

  c>         courses => @courses,

same bug as i pointed out in another post. you need a ref or anon array
there. that is very wrong. who knows what it is doing to your app?

@courses contains a list of numeric keys. If I:
print "Courses: [@courses]\n";
it will output something like this:
Courses: [23456 34567 45678]

This is NOT the problem here. What this does is make the hash element
$fac{$id}{courses} contain a scalar value like this:
'23456 34567 45678' This works perfectly and does exactly what I want
it to.

But thanks for pointing this out, CC.
 
W

Willem

ccc31807 wrote:
)> ? c> ? ? ? ? courses => @courses,
)>
)> same bug as i pointed out in another post. you need a ref or anon array
)> there. that is very wrong. who knows what it is doing to your app?
)
) @courses contains a list of numeric keys. If I:
) print "Courses: [@courses]\n";
) it will output something like this:
) Courses: [23456 34567 45678]
)
) This is NOT the problem here. What this does is make the hash element
) $fac{$id}{courses} contain a scalar value like this:
) '23456 34567 45678' This works perfectly and does exactly what I want
) it to.

No, it doesn't.

It would if you spelled it like this:

courses => "@courses",

But now, it would make:

$fac{$id}{courses} contain '23456', and
$fac{$id}{34567} contain '45678'.

And count yourself lucky that there are an odd number of elements,
otherwise the following keys and values would be swapped around.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
J

J. Gleixner

ccc31807 said:
Here is the working code in my test script. Unfortunately, I can't
post the data files. Note that this code does some other things and
contains debugging statements.

How can we run this?

Create a short example we can run that shows the issue. Just
populate an array or two with example data and use that
in place of your files.

Narrow your example code down to the issue, get rid of all the other
stuff and you might find the problem on your own.
 
C

ccc31807

) This is NOT the problem here. What this does is make the hash element
) $fac{$id}{courses} contain a scalar value like this:
) '23456 34567 45678' This works perfectly and does exactly what I want
) it to.

No, it doesn't.

I beg to differ, but it does. I've been running this particular piece
of code for about three years, and it has exactly the behavior I
described. This is a line from my debugging file with only the
personal information replaced with XXXXX.

1073251 => HASH(0x1e4ea6c)
xlist =>
middle => A
first => XXXXX
contract => XXXXX
csz => XXXXX
addy2 =>
courses => 235519 235524 237125
last => XXXXX
id2 => 1073251
addy1 => XXXXX Drive
mail => XXXXX@XXXXX
trs => Current Member
 
C

ccc31807

This is embarrassing!

Okay, now I really am embarrassed. It's not a Perl problem at all --
it's a Microsoft problem.

The script that I reference is the third one out of four, the whole
process takes about six input files and outputs several thousand PDF
files. I get the data from various people and a couple of databases.

I get some of the data in CSV format. One of my sources switched from
an Access database to an Excel file. Turns out that Excel strips out
the leading zeros if it thinks that the datum is an integer.

I really, really should have learned this lesson by now -- check the
code, check the data. Yes, I mostly validate the data as it comes in,
checking the format and so on, and the particular numeric datum I used
as a key validated as numeric. It never occurred to me to look at the
data file until after I spent several hours checking and rechecking my
code and posting on c.l.p.m.

'Garbage in, garbage out' isn't always the result of bad code, it can
be the result of bad data. Thanks to all, and please accept my apology
for the excitement.

CC.
 
U

Uri Guttman

c> I beg to differ, but it does. I've been running this particular piece
c> of code for about three years, and it has exactly the behavior I
c> described. This is a line from my debugging file with only the
c> personal information replaced with XXXXX.

then that is not the code that you are using. it will put the list of
courses into the hash as key/value pairs. the only way you get what you
claim is with "@courses". did you lose the quotes in pasting? if you
claim that, show exact runnable code that does this. you can whip up an
dummy example in 2 minutes. here is one:

perl -MData::Dumper -e '@x = ( 1 .. 4 ) ; %y = (x => @x); print Dumper \%y'
$VAR1 = {
'4' => undef,
'x' => 1,
'2' => 3
};

as seen, it doesn't do what you claim it does. possibly the dump trick
you are using is misleading you. use data::dumper to see what is really
there. here is what you seem to want:

perl -MData::Dumper -e '@x = ( 1 .. 4 ) ; %y = (x => "@x"); print Dumper \%y'
$VAR1 = {
'x' => '1 2 3 4'
};


or alternatively with a ref:

perl -MData::Dumper -e '@x = ( 1 .. 4 ) ; %y = (x => \@x); print Dumper \%y'
$VAR1 = {
'x' => [
1,
2,
3,
4
]
};


uri
 
U

Uri Guttman

c> 'Garbage in, garbage out' isn't always the result of bad code, it
c> can be the result of bad data. Thanks to all, and please accept my
c> apology for the excitement.

you still have a bug if you claim x => @y will do what you want. see my
other post on this.

uri
 
C

ccc31807

you still have a bug if you claim x => @y will do what you want. see my
other post on this.

-----------SCRIPT---------------
#! perl
# array.plx
use strict;
use warnings;
my %presidents;
while (<DATA>)
{
chomp;
my ($order, $first, $last, @years) = split /\|/;
$presidents{$order} = {
first => $first,
last => $last,
years => @years,
};
}

foreach my $k (sort keys %presidents)
{
print "$k => $presidents{$k}\n";
foreach my $k2 (sort keys %{$presidents{$k}})
{
print " $k2 => $presidents{$k}{$k2}\n";
}
}
exit(0);

__DATA__
1|George|Washington|1788 1792
2|John|Adams|1796
3|Thomas|Jefferson|1800 1804
4|James|Madison|1808 1812
32|Franklin|Roosevelt|1932 1936 1940 1944

----------OUTPUT----------------
D:\PerlLearn>perl array.plx
01 => HASH(0x248e5c)
first => George
last => Washington
years => 1788 1792
02 => HASH(0x182a344)
first => John
last => Adams
years => 1796
03 => HASH(0x182a3b4)
first => Thomas
last => Jefferson
years => 1800 1804
04 => HASH(0x182a8a4)
first => James
last => Madison
years => 1808 1812
32 => HASH(0x183ce44)
first => Franklin
last => Roosevelt
years => 1932 1936 1940 1944
 
M

Martijn Lievaart

(snip)
my ($order, $first, $last, @years) = split /\|/; $presidents {$order} =
{
first => $first,
last => $last,
years => @years,
}; (snip)

__DATA__
1|George|Washington|1788 1792
(snip)

This only "works" because @years has only one element, the string "1788
1792". It is still wrong, wrong, wrong. Fix your code before someone tags
on another element on the end and everything breaks.

HTH,
M4
 
C

ccc31807

@years always contains exactly one element, it is a non-arrayish array.

$years would work as well, and would avoid looking like it wouldn't
work...

You are right.

As an explanation, not an excuse, the &rest parameter in Lisp takes
the rest of the arguments and flattens all lists. I've found this very
useful in manipulating Lisp data, and guess I was half asleep at the
wheel, channeling Lisp while writing Perl.

What I saw was 'courses' as an array, and in fact use @courses later
on in the script to iterate through the elements, and was thinking
'list' when I should have seen 'scalar.'

My bad, and now I'm triple embarrassed. Uri and the others were
correct, and I wasn't.

CC.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top