testing whether a number is an integer

C

ccc31807

This is probably a Dumb Question, but I'll ask it anyway.

I have a number, which I use to validate an array, like this:
$number = scalar($@array) / 3;
If $number is an integer, the array is perfect and can be processed.
If it isn't, it's malformed and must be written to an error log.

Some languages have a predicate like is(integer()) which tests the
obvious. Does Perl have a built-in is_integer() equivalent, or will
have have to install something like Test::Numeric?

Thanks, CC.
 
U

Uri Guttman

c> This is probably a Dumb Question, but I'll ask it anyway.
c> I have a number, which I use to validate an array, like this:
c> $number = scalar($@array) / 3;

that isn't legal code. $@array makes no sense. i will assume you meant
just @array

c> If $number is an integer, the array is perfect and can be processed.
c> If it isn't, it's malformed and must be written to an error log.

c> Some languages have a predicate like is(integer()) which tests the
c> obvious. Does Perl have a built-in is_integer() equivalent, or will
c> have have to install something like Test::Numeric?

simple math will do it for you. the int func will truncate something to
an integer. so it $foo/3 is an int, its int() value will be the same.

my $int = @array/3 # no need for scalar() as / provides scalar context

if ( $int == int( $int ) ) { ...

uri
 
K

Keith Thompson

Uri Guttman said:
c> This is probably a Dumb Question, but I'll ask it anyway.
c> I have a number, which I use to validate an array, like this:
c> $number = scalar($@array) / 3;

that isn't legal code. $@array makes no sense. i will assume you meant
just @array

c> If $number is an integer, the array is perfect and can be processed.
c> If it isn't, it's malformed and must be written to an error log.

c> Some languages have a predicate like is(integer()) which tests the
c> obvious. Does Perl have a built-in is_integer() equivalent, or will
c> have have to install something like Test::Numeric?

simple math will do it for you. the int func will truncate something to
an integer. so it $foo/3 is an int, its int() value will be the same.

my $int = @array/3 # no need for scalar() as / provides scalar context

if ( $int == int( $int ) ) { ...

I'd say the condition you're really testing is whether the number of
elements in @array is a multiple of 3. So I might write something
like this:

die "Malformed array\n" if scalar @array % 3 != 0;
$number = scalar @array / 3;

I know you want better error handling that die(); this is just
an example.

And yes, I know the "scalar" operator is not strictly necessary.
IMHO it makes the code more readable.
 
J

Jürgen Exner

Uri Guttman said:
c> This is probably a Dumb Question, but I'll ask it anyway.
c> I have a number, which I use to validate an array, like this:
c> $number = scalar($@array) / 3;
c> If $number is an integer,
c> the array is perfect and can be processed.
c> If it isn't, it's malformed and must be written to an error log.

Is this a complicated way of testing if the number of elements in @array
is a multiple of 3?

Then why don't you use the modulo operator?
print "Not a multiple of 3 elements" if @array % 3;

jue
 
J

Jürgen Exner

Uri Guttman said:
simple math will do it for you. the int func will truncate something to
an integer. so it $foo/3 is an int, its int() value will be the same.

my $int = @array/3 # no need for scalar() as / provides scalar context

if ( $int == int( $int ) ) { ...

While probably not a problem in this case I would be _VERY_ wary of
using this code in general. Any rounding error caused by binary
arithmetic will bite you and therefore I would not ever use it, not even
if the division should yield an integer in the case of success.

Why not simply use modulo? It is meant for exactly this purpose and it
does not suffer from rounding errors.

jue
 
J

Jürgen Exner

Jürgen Exner said:
Is this a complicated way of testing if the number of elements in @array
is a multiple of 3?

Then why don't you use the modulo operator?
print "Not a multiple of 3 elements" if @array % 3;

Coming to think of it I have a very strong feeling that this is another
X-Y problem, caused by choosing a poor data structure.

If the number of elements in that array must be a multiple of 3 then
(unless there are some extraordinary circumstances) this implies that
the data is not a plain list of single elements but it is a list of
triplets. Had the OP used a proper data structure to represent this
fact, e.g. an array of triplets(*), then the integrity of his data would
be ensured by the data structure and we would not have this discussion
in the first place.

*: each triplet could be a hash or an array with 3 elements, depending
on the kind of data in each triplet.

jue
 
C

ccc31807

Coming to think of it I have a very strong feeling that this is another
X-Y problem, caused by choosing a poor data structure.

If the number of elements in that array must be a multiple of 3 then
(unless there are some extraordinary circumstances) this implies that
the data is not a plain list of single elements but it is a list of
triplets. Had the OP used a proper data structure to represent this
fact, e.g. an array of triplets(*), then the integrity of his data would
be ensured by the data structure and we would not have this discussion
in the first place.

*: each triplet could be a hash or an array with 3 elements, depending
on the kind of data in each triplet.

jue

This is a source file from a database of student courses. The source
file contains records like this:
ID,LAST,FIRST,MIDDLE,MAJOR, ... [COURSES]
where [COURSES] depends on the student enrollment in the term, which
can be from zero up to possible 7 or 8, and would be as follows:
ENG-101,BIO-202,ART-303,ABCD,BCDE,CDEF,N,N,X
These values are triplets, with all the courses first, then all the
sections, then all the statuses.

When I parse the line, I collect the individual data items into loop
variables like this:
my ($id, $last, $first, $middle, $major ... , @courses) =
parse_line();
The @courses array then contains the enrollment info. I divide it by
3, which gives me the number of courses. I then munge the @courses
data (actually by turning it into a series of strings like this:
"ENG-101-ABC-N")
The other data goes into a hash keyed on the ID, so I can print the
reports like this:

foreach my $k (keys %students)
{
print OUT qq($k,$students{$k}{last}, ... \n);
}

The vast majority of the time the @courses array is perfect, but
rarely it is malformed in some way, thus requiring me to check the
format of the array.

I have been doing it by moding by 3 and checking to see if the result
is not zero, but it strikes me that, if I could check to see if the
result of the division by 3 is an integer, I wouldn't have to resort
to the extra step.

CC.
 
K

Keith Thompson

ccc31807 said:
The vast majority of the time the @courses array is perfect, but
rarely it is malformed in some way, thus requiring me to check the
format of the array.

I have been doing it by moding by 3 and checking to see if the result
is not zero, but it strikes me that, if I could check to see if the
result of the division by 3 is an integer, I wouldn't have to resort
to the extra step.

There's an extra step either way: either you need to check whether
the number of fields is a multiple of 3, or you need to check
whether the result of dividing that number by 3 is an integer.

There is, of course, More Than One Way To Do It. I suggest that
checking whether the number of fields is a multiple of 3 expresses
the intent more clearly.
 
S

sln

This is a source file from a database of student courses. The source
file contains records like this:
ID,LAST,FIRST,MIDDLE,MAJOR, ... [COURSES]
where [COURSES] depends on the student enrollment in the term, which
can be from zero up to possible 7 or 8, and would be as follows:
ENG-101,BIO-202,ART-303,ABCD,BCDE,CDEF,N,N,X
These values are triplets, with all the courses first, then all the
sections, then all the statuses.

When I parse the line, I collect the individual data items into loop
variables like this:
my ($id, $last, $first, $middle, $major ... , @courses) =
parse_line();
The @courses array then contains the enrollment info. I divide it by
3, which gives me the number of courses. I then munge the @courses
data (actually by turning it into a series of strings like this:
"ENG-101-ABC-N")
The other data goes into a hash keyed on the ID, so I can print the
reports like this:

foreach my $k (keys %students)
{
print OUT qq($k,$students{$k}{last}, ... \n);
}

The vast majority of the time the @courses array is perfect, but
rarely it is malformed in some way, thus requiring me to check the
format of the array.

I have been doing it by moding by 3 and checking to see if the result
is not zero, but it strikes me that, if I could check to see if the
result of the division by 3 is an integer, I wouldn't have to resort
to the extra step.

Whatever you are doing to get the courses array populated should
be valid before population.

A clear sign of bad validation or parsing technique is that you
actually have to do a modulo on the finished array.
The finished array should be pristeen.
If you have a remainder, the entire array is flawed.
The place to find flaws is before the array is populated, not after.
Craft a better parsing strategy.

-sln
 
C

ccc31807

Whatever you are doing to get the courses array populated should
be valid before population.

I'm not 'doing' anything before populating the array. I'm reading a
file line by line, placing all the singular datums in appropriate
variables ($id, $last, $first, etc.), and whatever is left over I glob
into an array. I don't know how many items remain in the line at this
point -- it could be nothing.
A clear sign of bad validation or parsing technique is that you
actually have to do a modulo on the finished array.
The finished array should be pristeen.

In theory, yes. However, in practice the 'array' is simply a list of
however many datums remain in the line. I guess I could test the line
before parsing to see how many 'items' it contains, but that would
really be an extra step.
If you have a remainder, the entire array is flawed.
The place to find flaws is before the array is populated, not after.
Craft a better parsing strategy.

Such as? Here's my logic. You tell me if you see a better way. Assume
that each line looks like this:
123,Smith,John,Q,ENG-101,BIO-202,ART-303,ABCD,BCDE,CDEF,N,N,X

my %students;
while (<INFILE>)
{
next unless /\w/;
chomp;
my ($id, $last, $first, $middle, @courses) = parse_line();
$students{$id} = {
last => $last,
first => $first,
middle => $middle,
};
my $number = scalar(@courses) / 3;
my $mod = scalar(@courses) % 3;
unless ($mod == 0) { warn "MALFORMED $_\n"; }
else {
# munge @courses based on the value of $number
# construct a $course variable for each section and status
# then do something like this
push @{$students{$id}{courses}}, $course;
}
}

I would like to replace the unless test with something like this:
(is_integer($number)) Ideas?

CC.
 
S

sln

I'm not 'doing' anything before populating the array. I'm reading a
file line by line, placing all the singular datums in appropriate
variables ($id, $last, $first, etc.), and whatever is left over I glob
into an array. I don't know how many items remain in the line at this
point -- it could be nothing.


In theory, yes. However, in practice the 'array' is simply a list of
however many datums remain in the line. I guess I could test the line
before parsing to see how many 'items' it contains, but that would
really be an extra step.


Such as? Here's my logic. You tell me if you see a better way. Assume
that each line looks like this:
123,Smith,John,Q,ENG-101,BIO-202,ART-303,ABCD,BCDE,CDEF,N,N,X
^^^
I'm going to make a guess that this line is generated.
AND that something about the triplet is significant.

Your code does this:

my (whole bunch of scalar variables, @array) = parse_this(
'123,Smith,John,Q,ENG-101,BIO-202,ART-303,ABCD,BCDE,CDEF,N,N,X'
);

The first 4 fields (?) are '123,Smith,John,Q,'.
If there is no middle name, I asume its this:
'123,Smith,John,,'.

This:
'ENG-101,BIO-202,ART-303,'
looks like 3 courses. I guess everybody takes no more
or less than three, therefore 3 everytime.

The next set of 3 is this:
'ABCD,BCDE,CDEF,'

then this:
'N,N,X'

The only relationsip, since grouped in 3's, is that
every 3rd one is related.

So:
ENG-101, ABCD, N
is some kind of a record, with X number of fields all related
and each group of 3 is of the same kind, like this is the same kind:
ENG-101,BIO-202,ART-303

I would assume that you may be able to identify the type of item
in each group. That goes a long way toward validation.

Otherwise, your in a sea of improbability, where your lone requirement,
that multiple of 3, is just as lost and adrift in the ocean of uncertainty,
as a multiple of 4 or 2 or 1.

-sln
 
C

ccc31807

I'm going to make a guess that this line is generated.
AND that something about the triplet is significant.

The significance is that the database is a non-first-normal-form
database product named Unidata from IBM, with multi-valued fields, so
that we pull from three fields, courses, sections, and statuses, and
the values in the fields are 'associated', which explained why the
output is in the form that it's in.
The only relationsip, since grouped in 3's, is that
every 3rd one is related.

No, no, no! There are three GROUPS of fields, from zero up to a number
close to 9 or 10. In my example, if you divide @courses by 3, what you
get is the number of course sections that the student has enrolled in.
If the student has enrolled in no courses, the size of @courses is 0,
if he has enrolled in 10 courses, the size would be 30.
So:
  ENG-101, ABCD, N
is some kind of a record, with X number of fields all related
and each group of 3 is of the same kind, like this is the same kind:
 ENG-101,BIO-202,ART-303

Yes, except your loop would do something like this:
for (my i = 0; $i < $number; $i++)
{
my $crs = $courses[$i];
my $sec = $courses[$i + $number * 1];
my $sta = $courses[$i + $number * 2];
my $record = sprintf("%s-%s %s", $crs, $sec, $sta);
}
I would assume that you may be able to identify the type of item
in each group. That goes a long way toward validation.

I had used a RE to do this, but (as it turns out) there's enough
variation in the record to confuse a RE. Some of the courses look like
sections, and some of the sections look like courses.
Otherwise, your in a sea of improbability, where your lone requirement,
that multiple of 3, is just as lost and adrift in the ocean of uncertainty,
as a multiple of 4 or 2 or 1.

The data is contained in three groups, with each group associated in
order, so for the vast majority of cases using 3 works. The data is
grouped like this:
CRS1,CRS2,CRS3,CRS4,SEC1,SEC2,SEC3,SEC4,STA1,STA2,STA3,STA4
which indicates that the student has exactly four current enrollments.
12 elements divided by 3 groups equals 4 enrollments.

I guess my beef is that Perl lacks this kind of predicate -- but I'm
not really complaining. I just got my copy of 'Land of Lisp' by Conrad
Barski, and I intend to post a new thread on c.l.p.m. about some
things that Barski says about string manipulation. Of all the tools I
could use for my job, Perl without any doubt is the best. But that
doesn't mean that some kinds of predicates Perl lacks could on
occasion be useful.

Thanks, CC.
 
S

sln

I'm going to make a guess that this line is generated.
AND that something about the triplet is significant.

The significance is that the database is a non-first-normal-form
database product named Unidata from IBM, with multi-valued fields, so
that we pull from three fields, courses, sections, and statuses, and
the values in the fields are 'associated', which explained why the
output is in the form that it's in.
The only relationsip, since grouped in 3's, is that
every 3rd one is related.

No, no, no! There are three GROUPS of fields, from zero up to a number
close to 9 or 10. In my example, if you divide @courses by 3, what you
get is the number of course sections that the student has enrolled in.
If the student has enrolled in no courses, the size of @courses is 0,
if he has enrolled in 10 courses, the size would be 30.
So:
  ENG-101, ABCD, N
is some kind of a record, with X number of fields all related
and each group of 3 is of the same kind, like this is the same kind:
 ENG-101,BIO-202,ART-303

Yes, except your loop would do something like this:
for (my i = 0; $i < $number; $i++)
{
my $crs = $courses[$i];
my $sec = $courses[$i + $number * 1];
my $sta = $courses[$i + $number * 2];
my $record = sprintf("%s-%s %s", $crs, $sec, $sta);
}
I would assume that you may be able to identify the type of item
in each group. That goes a long way toward validation.

I had used a RE to do this, but (as it turns out) there's enough
variation in the record to confuse a RE. Some of the courses look like
sections, and some of the sections look like courses.
Otherwise, your in a sea of improbability, where your lone requirement,
that multiple of 3, is just as lost and adrift in the ocean of uncertainty,
as a multiple of 4 or 2 or 1.

The data is contained in three groups, with each group associated in
order, so for the vast majority of cases using 3 works. The data is
grouped like this:
CRS1,CRS2,CRS3,CRS4,SEC1,SEC2,SEC3,SEC4,STA1,STA2,STA3,STA4
which indicates that the student has exactly four current enrollments.
12 elements divided by 3 groups equals 4 enrollments.

I see, 3 groups is the constant.
It sounds like your saying:
'I don't know what the codes that a particular group can have are,
and its possible the same codes can be in more than one group.'

This means you are better off dividing by 3 as you do now.
There are criteria you could use in the case of unknown's,
that could get you the form:
1. number of characters
2. character class (like [YN])
3. special punctuation
4. symetry and order
5. any/all of the above

This of course, won't validate anything, but it could get
the number (order). But, so does dividing by 3.
In that respect, its a wash, keep what you have.
But be warned, even if its off by 1, the whole array is
invalid and the data can't be munged.

Take a hard look at all the possible data structure of
each individual group. Notice thier obvious differences and
similarities. You do know the group order so that helps a lot.
If you can take the most dissimilar group out of contention,
it will give you the number (order). Then after separating the
remaining 2 groups, start the validation process. Because,
once all the groups are split up, its ok to have similarity
with respect to regular expressions as this is not an issue
at this time. If something doesen't pass within the group,
its now easy to flag its position to some error log. And of course,
continue processing.

-sln
 
X

Xho Jingleheimerschmidt

Jürgen Exner said:
Coming to think of it I have a very strong feeling that this is another
X-Y problem, caused by choosing a poor data structure.

It looks quite the opposite to me. He is trying to fix a poor data
structure, by transforming it into a better one.
If the number of elements in that array must be a multiple of 3 then
(unless there are some extraordinary circumstances) this implies that
the data is not a plain list of single elements but it is a list of
triplets. Had the OP used a proper data structure to represent this
fact, e.g. an array of triplets(*), then the integrity of his data would
be ensured by the data structure and we would not have this discussion
in the first place.

If you magically wish away the problems you are trying to solve, then
you magically no longer have any problems to solve. Using this
technique, we can avoid any need (or opportunity) to ever use Perl at
all. Simply redefine the problem to exist in some other domain.

Xho
 
C

ccc31807

Bullocks:

    die "Problems in row $." if @courses % 3;

Ha, ha, ha!

Nice and sweet. Except I'll use warn() instead of die() since there's
no reason not to process the rest of the rows if one is malformed.

Thanks, CC.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top