Perl Peeves

B

Bruce Cook

Peter said:
Peter said:
On 2009-01-29 22:54, Bruce Cook
Peter J. Holzer wrote:
(5) That "special false". I was going nuts trying to figure out
what was different between

[ '' in string context, 0 in numeric context ] [...]
Anyone who applies a numeric operator to a logical value and expects
consistent results is asking for trouble. [...]
In strongly-typed languages you would get a compiler or run-time
error/warning, however perl is a scripting language and is built to be
flexible, assumes you know what you're doing and will silently oblige
even the most horrendous abuses.

You actually have the same issue in C: false is defined as 0 and true
is !false.

However, in C, !0 is defined as 1.

This is where lots of people make mistakes in C.

You seem to be one of them.

if(a) is not the same as
if(a == !0)

I didn't say that. I said !0 is the same thing as 1 which means, that

if (a == !0)

is exactly the same as

if (a == 1)

While

if (a)

is the same thing as

if (a != 0)

Obviously (a != 0) and (a == 1) are not the same thing.

Yep, I missed the distinction in your original statement, I don't use the !0
construct as I find it rather pointless, so read it as "not zero".
Apologies

[...]
Then they were not C compilers.

Each of the operators < (less than), > (greater than), <= (less than
or equal to), and >= (greater than or equal to) shall yield 1 if the
specified relation is true and 0 if it is false.89) The result has
type int.

(ISO-9899:1999. A similar definition is in Appendix A, section
7.6 of K&R I, German translation (I don't have the original)

Certainly the first ISO draft said that, I don't think I remember it in K&R,
unfortunately I can't find my copy so I'll believe you.
So the result of

(-2 > 0)

would be -2 (i.e., true)? I think you should think about this a bit
more.

You're right, I think I was thinking of ==

I am quite sure of this as it was one of the first nasties that bit me when
I first started using C. Functioning code on 1 platform was just being
strange on another (probably astech C or BDS C). The development platform
was VMS and TOPS-10, which had a more modern C compiler (as modern as they
got in '83) and we were back-porting to CP/M and RSX systems.
If (-2 > 0) is true that's pretty broken by any measure. If a compiler
produces a different result than the C specification says it should, it
is still broken.

Certainly. I must admit to not being fully familiar with the standards.
When the original drafts were published the whole thing was a bit of a shit-
fight with various large organisations attempting to wedge their own self-
serving bits into the standard. I was interested in the process at the
start however as things got more stupid rapidly lost interest and just got
on with programming. I haven't really revisited since.

The first standard took a long time to appear and a very long time before we
saw any compilers that believed in the standards, much less obeyed them.

[...]
However, every version of perl (note: lower case 'p') since at least
perl4 (probably perl1) has always returned 1 as the true value for the
comparison operators (==, !=. <. >. <=, >=, eq, ne, lt, gt, le, ge) and
the logical negation (!, not). It is not clear to me whether this is
undocumented because @Larry want to keep the option of changing it one
day (in this case, why is the false value documented? The ''/0 mixture
is rather bizarre and it seems much more likely that one would like to
change that in some future revision of the language), or whether it is
simply undocumented because it's the same as in C and "people know that
anyway, so we don't have to write it down". (Perl documentation has
become more self-sufficient over the years, but originally it assumed a
lot of general Unix knowledge and there are still spots which are only
comprehensible if you have a background as a Unix/C programmer - this
might be one of them)

Yep, I see your point. I personally don't treat the result of a comparison
operator as a useful int so it's not something I irrelevant. Obviously
others like yourself who do program in this manner would find it frustrating
to have a commonly used tool not have an officially defined behavior.

The way perl was put together originally was fairly loose which gave perl
it's flexibility. This created quite a few language "features" the people
discovered and started using which either became official language
constructs or Larry had to break or implement properly/differently in
subsequent versions. There was also a lot of experimental features released
that often were replaced by a better way of doing it in a later revision
(objects anyone ?), all of which meant the documentation suffered quite a
bit.

As you say, more recently the documentation has become quite good, though
even now I find some things I know about that I can't find a good
explanation for in the docs.


Bruce
 
U

Uri Guttman

PJH> This may have been true on the PDP-11, although I doubt it. It certainly
PJH> wasn't a common feature of CPUs at the time. Most only set the flags as
PJH> the result of some computation, not a simple MOV. (There is a special
PJH> TST instruction to set the status flags depending on the contents of a
PJH> register)

as someone who did tons of pdp-11 assembler work, let me clarify some
things (as well as my faulty ancient ram can do). the 11 had registers
but could directly access any ram location for any major op (of which
there weren't that many :). you didn't have to load into a reg to do a
test/branch (unlike risc designs or old 8 bitters). and yes, it did have
the Z bit so you could test for zero or not zero (along with negative
and overflow). the coolest thing was to do a test (or look at the
results of an op that set the flag bits) and do more than one branch on
the results. such as decrementing towards zero. branch on zero first to
handle a counted down (e.g. a timer) value. then branch on negative to
handle a overcounted value (i would set a timer value to 0 so the next
tick down to a negative would disable it). then fall through to handle
the positive case (not finished counting down).

MOV didn't set the flags since it didn't do any real operation. there
was a TST op just to set the flags based on a single value with no op
being done.

and yes, the ++/-- ops and c's early handling of 0/not-0 were inherited
from the 11. and so perl can claim inheritance from the pdp-11 and dec's
genius in designing its instruction set. still my fave of all time. not
the most powerful, but the most elegant given its goals and
restrictions.

PJH> If (-2 > 0) is true that's pretty broken by any measure. If a
PJH> compiler produces a different result than the C specification
PJH> says it should, it is still broken.

that would be very nasty. it implies using the SUB op for GT/LT
comparisons which is not cool at any level. i know it was done in
assembler but the Z flag was checked for comparisons, not the negative
flag. if some compiler decided to use the actual result for the boolean
value and it was used in an expression, you already said it gets nasty.

PJH> Who said I don't like it? I never said such a thing. I don't even think
PJH> it is very complex. The number 0, the strings '' and '0' and the undef
PJH> value are false, all other scalars are true.

i agree this is a simple and clear rule. why people fuss over it, i
don't know. let perl be perl. forcing a specific value to be the one
'true' false (sic :) is a waste of time and anal. it is like normalizing
for no reason and it bloats the code. even the concept of using the
result of a boolean test for its numeric or string value bothers me. a
boolean test value (of any kind in any lang) should only be used in a
boolean context. anything else is a slippery shortcut that makes the
code more complex and harder to read. saving a boolean and loading it
again is fine but the serialized format shouldn't matter. we are not
talking config files with a boolean entry here - that is another story.

PJH> day (in this case, why is the false value documented? The ''/0 mixture
PJH> is rather bizarre and it seems much more likely that one would like to
PJH> change that in some future revision of the language), or whether it is
PJH> simply undocumented because it's the same as in C and "people know that
PJH> anyway, so we don't have to write it down". (Perl documentation has
PJH> become more self-sufficient over the years, but originally it assumed a
PJH> lot of general Unix knowledge and there are still spots which are only
PJH> comprehensible if you have a background as a Unix/C programmer - this
PJH> might be one of them)

perl has the magic false of 0 and '' so it will DWIM and coerce as you
would expect given a numeric or string context. it is well defined and
bahaves cleanly. not much more to be said but you two can keep fighting.

uri
 
P

Peter J. Holzer

PJH> day (in this case, why is the false value documented? The ''/0 mixture
PJH> is rather bizarre and it seems much more likely that one would like to
PJH> change that in some future revision of the language), or whether it is
PJH> simply undocumented because it's the same as in C and "people know that
PJH> anyway, so we don't have to write it down". (Perl documentation has
PJH> become more self-sufficient over the years, but originally it assumed a
PJH> lot of general Unix knowledge and there are still spots which are only
PJH> comprehensible if you have a background as a Unix/C programmer - this
PJH> might be one of them)

perl has the magic false of 0 and '' so it will DWIM and coerce as you
would expect given a numeric or string context. it is well defined and
bahaves cleanly.

Yes, magic false is well-defined. But why doesn't perl define the value
returned for true by the comparison operators?

hp
 
T

Tim McDaniel

If you are printing this only for debugging reasons, you probably
know about the context to tell that "nothing visible printed" means
false.

Or I forgot to put that expression into the debug output statement.
I would think that to be unlikely in the case that I snipped, where it
was one variable being dumped, but if I were trying to dump four
variables, the mistake might be easier to overlook. I like to
distinguish when something is deliberately missing from accidentally
missing.
 
A

A Dude

Well, on a standard terminal, you can't distinguish '0' from '0 ',
either, so that's rather pointless. If you are printing this only for
debugging reasons, you probably know about the context to tell that
"nothing visible printed" means false.

Good point. There is an inherent ambiguity in printing text without
delimiters. And your expectation would play an important role, which I
guess plays into the reason why I wouldn't blanch a
 
A

A Dude

Good point. There is an inherent ambiguity in printing text without
delimiters. And your expectation would play an important role, which I
guess plays into the reason why I wouldn't blanch a

....t the original unbalanced equals sign.

But still that there is a broader class of ambiguity (undelimited
text) does not diminish the idea that there is a distinct ambiguity in
the case mentioned.
 
S

sln

Peter said:
Peter J. Holzer wrote:

On 2009-01-29 22:54, Bruce Cook
Peter J. Holzer wrote:
(5) That "special false". I was going nuts trying to figure out
what was different between

[ '' in string context, 0 in numeric context ] [...]
Anyone who applies a numeric operator to a logical value and expects
consistent results is asking for trouble. [...]
In strongly-typed languages you would get a compiler or run-time
error/warning, however perl is a scripting language and is built to be
flexible, assumes you know what you're doing and will silently oblige
even the most horrendous abuses.

You actually have the same issue in C: false is defined as 0 and true
is !false.
^^^^^
This may be an odd way to look at it since the unary and comparison operators
return 1 for true, where true can be -5 or +299, or Not Zero.
True doesen't necessarily equal 1. So "true = !false" is only a single value
in a set of values. There is only one truth, false = 0.
^^^^^ ^^^^^
I know you mean this in the context stated above, but it looks funny
when put together this way.
In fact, if a is equal to 1, then (a != 0) == (a == 1) is true.

The only thing for sure is false is equal to 0, everything else is true.
Yep, I missed the distinction in your original statement, I don't use the !0
construct as I find it rather pointless, so read it as "not zero".
Apologies

I'm a little confused, the !0 is not a construct, its a statement and definition
of the set of values that are true.

I think before the standards commitee got thier head around this not too long ago,
flase was anything less than or equal to zero, true was greater than zero.
I remember many old compilers where this was the case. Thus the period of confusion
updating legacy code on the sucess or failure of function calls. Still makes old
programmers cringe.

sln
 
H

Hans Mulder

Jürgen Exner said:
Either way, if unary + were to act like the unary -, i.e. evaluating its
argument as a scalar and returning the numerical value of it, that would
be more consistent with at least my intuitive expectations.

Your expectations are off: unary - does not always take the numerical
value of its argument:

$ perl -lwe '$x="blah"; print -$x;'
-blah

Perldoc perlop describes what unary - does with string arguments.

Hope this helps,

-- HansM
 
B

Bruce Cook

Uri said:
PJH> This may have been true on the PDP-11, although I doubt it. It
certainly PJH> wasn't a common feature of CPUs at the time. Most only
set the flags as PJH> the result of some computation, not a simple MOV.
(There is a special PJH> TST instruction to set the status flags
depending on the contents of a PJH> register)

as someone who did tons of pdp-11 assembler work, let me clarify some
things (as well as my faulty ancient ram can do). the 11 had registers
but could directly access any ram location for any major op (of which
there weren't that many :). you didn't have to load into a reg to do a
test/branch (unlike risc designs or old 8 bitters). and yes, it did have
the Z bit so you could test for zero or not zero (along with negative
and overflow). the coolest thing was to do a test (or look at the
results of an op that set the flag bits) and do more than one branch on
the results. such as decrementing towards zero. branch on zero first to
handle a counted down (e.g. a timer) value. then branch on negative to
handle a overcounted value (i would set a timer value to 0 so the next
tick down to a negative would disable it). then fall through to handle
the positive case (not finished counting down).

MOV didn't set the flags since it didn't do any real operation. there
was a TST op just to set the flags based on a single value with no op
being done.

and yes, the ++/-- ops and c's early handling of 0/not-0 were inherited
from the 11. and so perl can claim inheritance from the pdp-11 and dec's
genius in designing its instruction set. still my fave of all time. not
the most powerful, but the most elegant given its goals and
restrictions.

Yes, I miss the elegance of the '11s. My early years were spent in a
combination of C (often DECUS) and MACRO-11 (often fixing DECUS C :)

Never since working with DEC systems have I ever dealt with better
documented systems. System calls were documented with precisely what they
would do under all circumstances, inputs and outputs were well defined, and
every error that could possibly be returned was listed and the meaning
defined in the context of that call. (I worked mainly on RSX). This made
development on those platforms much more straight forward, as long as you
bothered to read the manuals. Later the VMS documentation set was known as
the "Grey wall" - it was massive.

[...]
PJH> Who said I don't like it? I never said such a thing. I don't even
think PJH> it is very complex. The number 0, the strings '' and '0' and
the undef PJH> value are false, all other scalars are true.

i agree this is a simple and clear rule. why people fuss over it, i
don't know. let perl be perl. forcing a specific value to be the one
'true' false (sic :) is a waste of time and anal. it is like normalizing
for no reason and it bloats the code. even the concept of using the
result of a boolean test for its numeric or string value bothers me. a
boolean test value (of any kind in any lang) should only be used in a
boolean context. anything else is a slippery shortcut that makes the
code more complex and harder to read.

That's basically where I'm coming from - I have an immediate cringe when I
see the result of a test being used as an int. I find it odd that
normalization of bool results is built into the compiler, just so it can be
used in an integer context.

I also find that insisting that !0 = 1 is weird. As a statement of english
it's obviously false and in the context of a programming language is simply
a side-effect of the above mentioned normalisation being applied to a unary
logical operator and not useful in itself.

Bruce
 
T

Tim McDaniel

Your expectations are off: unary - does not always take the numerical
value of its argument:

$ perl -lwe '$x="blah"; print -$x;'
-blah

Perldoc perlop describes what unary - does with string arguments.

Thank you for that information. I didn't know that either. The Perl
5.00502 man page explanation:

Unary "-" performs arithmetic negation if the operand is
numeric. If the operand is an identifier, a string consisting
of a minus sign concatenated with the identifier is returned.
Otherwise, if the string starts with a plus or minus, a string
starting with the opposite sign is returned. One effect of
these rules is that -bareword is equivalent to the string
"-bareword". If, however, the string begins with a non-
alphabetic character (exluding "+" or "-"), Perl will attempt
to convert the string to a numeric and the arithmetic negation
is performed. If the string cannot be cleanly converted to a
numeric, Perl will give the warning 'Argument "the string"
isn't numeric in negation (-) at ....'
 
P

Peter J. Holzer

^^^^^ ^^^^^
I know you mean this in the context stated above, but it looks funny
when put together this way.
In fact, if a is equal to 1, then (a != 0) == (a == 1) is true.

Also if a is equal to 0.

But if (a != 0) and (a == 1) were the same thing it would have to be
true for all possible values of a. So let's try it with a == 2:

(2 != 0) == 1
(2 == 1) == 0

1 is not equal to zero, therefore (a != 0) is not the same as (a == 1).

QED.

hp
 
P

Peter J. Holzer

That's basically where I'm coming from - I have an immediate cringe when I
see the result of a test being used as an int.

I find this odd from someone who claims to have an extensive background
in assembler and early C programming. After all, in machine code
everything is just bits. And early C had inherited quite a bit (no pun
intended) from that mindset, if mostly via its typeless predecessors
BCPL and B.
I find it odd that
normalization of bool results is built into the compiler,

What "normalization of bool results is built into the compiler"?
just so it can be used in an integer context.
I also find that insisting that !0 = 1 is weird.

That's just what the ! operator is defined as in C. It is an integer
function which returns 1 if its argument is zero, and 0 otherwise.

In mathematical terms it is the Kronecker delta function with one
argument (and C's == operator is the Kronecker delta function with two
arguments).

As a statement of english it's obviously false

In English, it is just a syntax error. The exclamation point comes at
the end of a sentence, not the beginning ;-).

As a statement about the operator "!" in the programming language C it
is not obviously false, but provably true.

and in the context of a programming language is simply a side-effect
of the above mentioned normalisation being applied to a unary logical
operator

You make the mistake of assuming a-priory that there must be such a
thing as a "logical" or "boolean" type. C has no such thing. The
operators !, ==, <, &&, etc. all return results of type int.
(1 == 1) does not return "true", it returns "1", just like the δ(1,1)
returns 1. You may not like that, but that's the way it is in C.
and not useful in itself.

I find it very useful that operators built into the language returned a
defined result. If anything, C has too many "implementation defined" and
"undefined" corners for my taste.

hp
 
S

sln

Also if a is equal to 0.

But if (a != 0) and (a == 1) were the same thing it would have to be
true for all possible values of a. So let's try it with a == 2:

(2 != 0) == 1
(2 == 1) == 0

1 is not equal to zero, therefore (a != 0) is not the same as (a == 1).

QED.

hp

Nobody said anything about being true for all possible values of a.

Using that logic, if expression (a != 0) is not the same as (a == 1) then
it would be true for all possible values of a.

Since {0,1} satisfy the condition of (a != 0) == (a == 1) as true, the
answer is that (a != 0) is the same as (a == 1) sometimes, and not at other times.
Elementary quantim physics.

sln
 
B

Bruce Cook

Peter said:
I find this odd from someone who claims to have an extensive background
in assembler and early C programming. After all, in machine code
everything is just bits. And early C had inherited quite a bit (no pun
intended) from that mindset, if mostly via its typeless predecessors
BCPL and B.

It's basically a background thing. As you say everything is just bits. The
earlier compilers I work with were all 16 bit, and literally everything was
a 16-bit int, pointers, *everything* (even when chars were passed into
functions, there were passed 16-bit (to satisfy even-byte boundary
constraints), manipulated 16-bit, you just ignored the top 8-bits of the
word in certain operations. To add to this, the compilers didn't do a lot
of sanity checking, the compiler just assumed you knew what you were doing
and would faithfully "just do it". Early compilers didn't have function
prototyping (a function prototype was a syntax error), void was a keyword
introduced to the language later, so void * was unheard of in most code.
(If you wanted to check your code out for bad stuff you ran lint over the
code and it would give you a whole bunch of error lines, fairly much what
you'd expect to get as warnings, and in some case outright errors from a
modern compiler)

This engendered very fast and loose usage of ints for everything. In a lot
of early code you'd see pointers, chars and all sorts declared as int and
some truely horrendous coding:

foo (a,b) {
int a,b; /* We don't know what a and b are at this point, just declare them
int for now */

struct bar *z;
struct frotz * y;

/* First word that a points to signals a bar or not */
if(*(int *)a == 1) {
z= a;
...
} else {
y=a;
...
}

Code could have been done properly using unions, however that was work and
because everyone knew what was really happening in the background why
bother?

This all came crashing down when we started porting to other platforms,
which had different architecture driven rules.

We still didn't have compilers that would point our our stupidity, so to
make code portable took a great deal of self-discipline. One of the big
things you had to get right was data typing integrity (even though C didn't
explicitly type anything much)

It became quite common for a project to have a project-wide header file
which defined the projects' base datatypes and one of the common ones that
turned up was:

typedef int bool;

This didn't mean bool was special, declaring it just signaled to the
programmers that they were dealing with an int that had certain meaning.

In systems programming you would get things like this simplistic example:

bool is_rx_buffer_full (int buffer) {
....
return (qio_buffer_set[buffer]->qio_status & QIO_RX_FLAGS);
}

note that this function is declared as returning bool, which implies that
what it returns should only be used in a conditional expression. If you
tried to use it as an int, you could, but you wouldn't get what you
expected.

....
step_func= operations[buffer][is_rx_buffer_full(buffer)];
/* oh shit, we seem to have wandered off into the weeds -
no runtime bounds checking in C so it's time to fire up the
debugger to find why we're crashing. */

knowing that is_rx_buffer_full is bool, we would have done it this way:
step_func= operations[buffer][is_rx_buffer_full(buffer)?1:0];

we could have made is_rx_buffer_full() return 0 or 1, but that would have
taken extra code and on 16 bits you simply didn't just do extra code to make
things pretty. In most cases that return value would just have been used in
a condition and so would be fine (and that's what bool in this header says
it's for). 0 or non-0 served well enough for what it was to accomplish.

if you wanted to use a bool as an int, you disciplined yourself to normalize
it then.

This was all done by convention - no typing in the language, but if the
programmer ignored the implied typing, you could almost guarantee there'd be
tears.


The whole industry hit the portability issue at about the same time. This
lead to a lot of the modern features of C, including posix, function
prototypes, a lot of the standard header files, many of the standard
compiler warnings and of course the C standards. Others decided that C was
just stupid for portability and created their own language (DEC used BLISS,
which was an extremely strongly typed language and served them well across
many very different architectures)
What "normalization of bool results is built into the compiler"?

Consider:
c= (a || b)

as you say, these are just ints like everything else in C.
Easiest way to compile that on most architectures would be:

mov a,c
bis c,b ; bis being the '11 OR operator

however if you need to make sure your result is 1 or 0

mov a,c
bis c,b
beq 1$ ; normalize !0 to 1
mov #1,c
1$:

(obviously here I'm ignoring architecture implementation of the actual
storage of a,b & c for clarity )

[...]
You make the mistake of assuming a-priory that there must be such a
thing as a "logical" or "boolean" type. C has no such thing. The
operators !, ==, <, &&, etc. all return results of type int.
(1 == 1) does not return "true", it returns "1", just like the δ(1,1)
returns 1. You may not like that, but that's the way it is in C.

What I meant was literally that the construct !0 is not useful in itself.
I find it very useful that operators built into the language returned a
defined result. If anything, C has too many "implementation defined" and
"undefined" corners for my taste.

Yes, but I think it's also one of the strengths of C. You define your own
rules to make it fit to your needs for a particular project and as long as
you're consistent and design those rules properly it all works.

Modern languages try to address these undefined corners, but it often makes
them difficult to use for some applications.

Bruce
 
H

Hans Mulder

Peter said:
The "special false" value is an integer (the integer
zero) and it is also printable (it prints as ''), but depending on your
expectations it may not be "cleanly printable" (a "normal" integer of
value 0 prints as '0', not '').

The regular documentation does not admit it, but a Perl scalar can have
a numeric value and a string value simultaneously, and the two need not
match. The special false value has the values 0 and '':

my $false = (1 < 0);
printf "Numeric value: %d, String value: '%s'\n", $false, $false;

prints:

Numeric value: 0, String value: ''


The magic variable $! tends to contain more wildly mismatching values.
For example:

open my $fh, '<', "blah blah blah";
printf "Numeric: %d, String: '%s'\n", $!, $!;

prints (on my system):

Numeric value: 2, String value: 'No such file or directory'

(Both string and numeric values of $! are OS dependent.)


If you want to create your own mismatching scalars, you can use the
function dualvar in the module Scalar::Util

use Scalar::Util 'dualvar';

my $t = dualvar(1, "True");
printf "Numeric value: %d, String value: '%s'\n", $t, $t;

prints:

Numeric value: 1, String value: 'True'

Incidentally, 'dualvar' is a misnomer: it returns dual values, not dual
variables.

The function Dump in the module Devel::peek is the only I'm aware of
that will accurately report the value(s) of such a dual scalar.

[.....]
In perl, there are only scalars, so you have to tell it whether you
want a string or a numeric comparison[1].
[1] Strictly speaking the interpreter could figure it out at run-time.

The bitwise operators ( &, |, ^ and ~) do exactly that. They check at
runtime whether their operands have a string value or number a numeric
value and behave accordingly:

my $x = 6;
my $y = 12;

printf "Numeric bitwise or: %d\n", $x | $y ;
printf "String bitwise or: %s\n", "$x" | "$y";

prints:

Numeric bitwise or: 14
String bitwise or: 72
But since the type (NV, IV, UV, PV, or a combination) can be changed
just by using a variable, this would be extremely confusing.

There's that.


Hope this helps,

-- HansM
 
B

Bruce Cook

Hans said:
The regular documentation does not admit it, but a Perl scalar can have
a numeric value and a string value simultaneously, and the two need not
match. The special false value has the values 0 and '':

my $false = (1 < 0);
printf "Numeric value: %d, String value: '%s'\n", $false, $false;

prints:

Numeric value: 0, String value: ''


The magic variable $! tends to contain more wildly mismatching values.
For example:

open my $fh, '<', "blah blah blah";
printf "Numeric: %d, String: '%s'\n", $!, $!;

prints (on my system):

Numeric value: 2, String value: 'No such file or directory'

I hadn't realized this, that's a really neat feature !
(Both string and numeric values of $! are OS dependent.)


If you want to create your own mismatching scalars, you can use the
function dualvar in the module Scalar::Util

use Scalar::Util 'dualvar';

my $t = dualvar(1, "True");
printf "Numeric value: %d, String value: '%s'\n", $t, $t;

prints:

Numeric value: 1, String value: 'True'

Incidentally, 'dualvar' is a misnomer: it returns dual values, not dual
variables.

I assume that knows about numeric vs string operators (== vs eq) and would
use the string value on assignment ?

nice, you learn something every day.

I was aware of functions that returned either scalar or array depending upon
how they're called:
@foo= (scalar) time(); #Returns a single epoch
vs @foo= time(); # returns the components hour:min:sec etc


Bruce
 
T

Tad J McClellan

Bruce Cook said:
Hans Mulder wrote:



I hadn't realized this, that's a really neat feature !


For those playing along at home, the "irregular" documentation :)
admits this duality in the "Double-Typed SVs" section of perlguts.pod.

I assume that knows about numeric vs string operators (== vs eq)


Yes. These both make output:

print "numeric true\n" if $t == 10 - 9;
print "string true\n" if $t eq 'Tr' . 'ue';

and would
use the string value on assignment ?


Err, no, the duality is preserved on assignment:

my $other = $t;
printf "Numeric value: %d, String value: '%s'\n", $other, $other;

I was aware of functions that returned either scalar or array


Functions return either a scalar or a *list*.

depending upon
how they're called:
@foo= (scalar) time(); #Returns a single epoch
vs @foo= time(); # returns the components hour:min:sec etc


Assignment determines the context of its RHS from whatever is
on its LHS (see "Context" in perldata.pod) so your 1st example
above can be written more simply:

$foo = time(); #Returns a single epoch
 
B

Brad Baxter

Thank you for that information.  I didn't know that either.  The Perl
5.00502 man page explanation:

       Unary "-" performs arithmetic negation if the operand is
       numeric.  If the operand is an identifier, a string consisting
       of a minus sign concatenated with the identifier is returned.
       Otherwise, if the string starts with a plus or minus, a string
       starting with the opposite sign is returned.  One effectof
       these rules is that -bareword is equivalent to the string
       "-bareword".  If, however, the string begins with a non-
       alphabetic character (exluding "+" or "-"), Perl will attempt
       to convert the string to a numeric and the arithmetic negation
       is performed.  If the string cannot be cleanly convertedto a
       numeric, Perl will give the warning 'Argument "the string"
       isn't numeric in negation (-) at ....'

Just some observations:

From: http://perldoc.perl.org/perldata.html

Barewords

A word that has no other interpretation in the grammar will be
treated as if it were a quoted string. These are known as "barewords".
As with filehandles and labels, a bareword that consists entirely
of lowercase letters risks conflict with future reserved words, and
if you use the use warnings pragma or the -w switch, Perl will warn
you about any such words.

$ perl -v
This is perl, v5.8.8 built for sun4-solaris

$ perl -wle'$_=bareword;print'
Unquoted string "bareword" may clash with future reserved word at -e
line 1.
bareword

(as predicted)

$ perl -wle'$_=-bareword;print'
-bareword

$ perl -wle'$_= - -bareword;print'
+bareword

(unary "-" silences the warning, I guess)

$ perl -wle'$_= - +bareword;print'
Unquoted string "bareword" may clash with future reserved word at -e
line 1.
-bareword

$ perl -wle'$_=+bareword;print'
Unquoted string "bareword" may clash with future reserved word at -e
line 1.
bareword

(unary "+" doesn't appear to silence the warning)

$ perl -wle'$_=_bareword;print'
_bareword

(i.e., it does not consist of all lowercase letters, so no warning)

$ perl -wle'$_=-_bareword;print'
-_bareword

(hmmm, so 'If, however, the string begins with a non-
alphabetic character (exluding "+" or "-"), Perl will attempt
to convert the string to a numeric and the arithmetic negation
is performed.' should maybe include 'or begins with "_"'
somewhere?)

$ perl -wle'$_= - -_bareword;print'
+_bareword

$ perl -wle'$_= - +_bareword;print'
-_bareword

$ perl -wle'$_=+_bareword;print'
_bareword
 
P

Peter J. Holzer

It's basically a background thing. As you say everything is just bits. The
earlier compilers I work with were all 16 bit, and literally everything was
a 16-bit int, pointers, *everything*

floats and longs weren't, I hope.
(even when chars were passed into functions, there were passed 16-bit
(to satisfy even-byte boundary constraints),

char (or more precisely any integral type smaller than int) is promoted
to (unsigned) int when passed to a function without a prototype. This is
still the case in C.
manipulated 16-bit, you just ignored the top 8-bits of the word in
certain operations.

In arithmetic expressions, char (or more precisely any integral type
smaller than int) is promoted to (unsigned) int. This is still the case
in C.
To add to this, the compilers didn't do a lot of sanity checking, the
compiler just assumed you knew what you were doing and would
faithfully "just do it".

That's what lint was for, as you note below. If you have only 64 kB
address space, you want to keep your programs small.
Early compilers didn't have function prototyping (a function prototype
was a syntax error),

Prototypes were originally a feature of C++ and were introduced into C
with the C89 standard. I think I've used compilers which supported them
before that (gcc, Turbo C, MSC, ...) but it's too long ago for me to be
certain.
void was a keyword introduced to the language later, so void * was
unheard of in most code.

About the same time as prototypes, although I don't think I've ever used
a C compiler which didn't support it, while I've used several which
didn't support prototypes.
This engendered very fast and loose usage of ints for everything. In a lot
of early code you'd see pointers, chars and all sorts declared as int and
some truely horrendous coding: [...]
Code could have been done properly using unions, however that was work and
because everyone knew what was really happening in the background why
bother?

This all came crashing down when we started porting to other platforms,
which had different architecture driven rules.

As they say, "port early, port often". Thankfully I was exposed to the
VAX and 68k-based systems shortly after starting to program in C, so I
never got into the "I know what the compiler is doing" mindset and
rather quickly got into the "if it isn't documented/specified you cannot
rely on it" mindset.

It became quite common for a project to have a project-wide header file
which defined the projects' base datatypes and one of the common ones that
turned up was:

typedef int bool;

This didn't mean bool was special, declaring it just signaled to the
programmers that they were dealing with an int that had certain meaning.

That's a good thing if the "certain meaning" is documented and strictly
adhered to.

In systems programming you would get things like this simplistic example:

bool is_rx_buffer_full (int buffer) {
....
return (qio_buffer_set[buffer]->qio_status & QIO_RX_FLAGS);
}

So a "bool" isn't a two-valued type - it can take many values. This is
not what I expect from a boolean type.
note that this function is declared as returning bool, which implies that
what it returns should only be used in a conditional expression. If you
tried to use it as an int, you could, but you wouldn't get what you
expected.

Actually I would get what I expect if I treat your "bool" as an int, but
not what I expect when I treat your "bool" as what I expect from a
boolean type.

Expectations differ.

So documentation is very important and this is (to get back to Perl) why
I criticised that the "true" return value of several operators in Perl
is not documented.

The whole industry hit the portability issue at about the same time. This
lead to a lot of the modern features of C, including posix, function
prototypes, a lot of the standard header files, many of the standard
compiler warnings and of course the C standards. Others decided that C was
just stupid for portability and created their own language (DEC used BLISS,
which was an extremely strongly typed language and served them well across
many very different architectures)

Actually, BLISS is older than C, so it can't have been developed because
people were disappointed by C. Also according to wikipedia, BLISS was
typeless, not "extremely strongly typed".


Consider:
c= (a || b)

as you say, these are just ints like everything else in C.
Easiest way to compile that on most architectures would be:

mov a,c
bis c,b ; bis being the '11 OR operator

Not generally, because

* || is defined to be short-circuiting, so it MUST NOT evaluate
b unless a is false.
* a and b need not be integer types.

And of course the result of the operation is defined as being 0 or 1.


I don't see this as "normalisation", because there is no intermediate
step which does a bit-wise or.

c = (a || b)

is semantically exactly equivalent to

c = (a != 0 ? 1
: (b != 0 ? 1
: 0))

It is not equivalent to

c = ((a | b) != 0 ? 1 : 0)

(Of course in some situations an optimizing compiler may determine that
it is equivalent in this specific situation and produce the same code)

Yes, but I think it's also one of the strengths of C. You define your own
rules to make it fit to your needs for a particular project and as long as
you're consistent and design those rules properly it all works.

Modern languages try to address these undefined corners, but it often makes
them difficult to use for some applications.

I strongly disagree with this. The various implementation defined and
undefined features of C don't make it simpler for the application
programmer - on the contrary, they make it harder, sometimes a lot
harder. What they do simplify (and in some cases even make possible) is
to write an efficient compiler for very different platforms.

hp
 
B

Bruce Cook

Peter said:
floats and longs weren't, I hope.

Where floats existed, they wern't :)
In some cases, longs were. however you're right mostly longs were bigger
ints.
char (or more precisely any integral type smaller than int) is promoted
to (unsigned) int when passed to a function without a prototype. This is
still the case in C.


In arithmetic expressions, char (or more precisely any integral type
smaller than int) is promoted to (unsigned) int. This is still the case
in C.


That's what lint was for, as you note below. If you have only 64 kB
address space, you want to keep your programs small.


Prototypes were originally a feature of C++ and were introduced into C
with the C89 standard. I think I've used compilers which supported them
before that (gcc, Turbo C, MSC, ...) but it's too long ago for me to be
certain.

Turbo C only gained prototypes during the C89 pre-standards releases. I
remember seeing that come in in a turbo c release and thinking "that would
be nice if only it were portable" :)
About the same time as prototypes, although I don't think I've ever used
a C compiler which didn't support it, while I've used several which
didn't support prototypes.

Most of the work I was doing on these platforms was from '81 to '94, so most
of it was pre-standards, and even after '89 it took a while for compilers to
become available on these platforms that supported c89 properly.
This engendered very fast and loose usage of ints for everything. In a
lot of early code you'd see pointers, chars and all sorts declared as int
and some truely horrendous coding: [...]
Code could have been done properly using unions, however that was work
and because everyone knew what was really happening in the background why
bother?

This all came crashing down when we started porting to other platforms,
which had different architecture driven rules.

As they say, "port early, port often". Thankfully I was exposed to the
VAX and 68k-based systems shortly after starting to program in C, so I
never got into the "I know what the compiler is doing" mindset and
rather quickly got into the "if it isn't documented/specified you cannot
rely on it" mindset.

Yes, a pair of reasonably disparate architectures. Would have made quite a
few of the porting issues quite obvious from the start.
It became quite common for a project to have a project-wide header file
which defined the projects' base datatypes and one of the common ones
that turned up was:

typedef int bool;

This didn't mean bool was special, declaring it just signaled to the
programmers that they were dealing with an int that had certain meaning.

That's a good thing if the "certain meaning" is documented and strictly
adhered to.

In systems programming you would get things like this simplistic example:

bool is_rx_buffer_full (int buffer) {
....
return (qio_buffer_set[buffer]->qio_status & QIO_RX_FLAGS);
}

So a "bool" isn't a two-valued type - it can take many values. This is
not what I expect from a boolean type.

Yes, agreed. It was quite common though for people to treat bools as
zero/not-zero in the same manner that C used false.
Actually I would get what I expect if I treat your "bool" as an int, but
not what I expect when I treat your "bool" as what I expect from a
boolean type.

Expectations differ.

So documentation is very important and this is (to get back to Perl) why
I criticised that the "true" return value of several operators in Perl
is not documented.



Actually, BLISS is older than C, so it can't have been developed because
people were disappointed by C. Also according to wikipedia, BLISS was
typeless, not "extremely strongly typed".

True, DEC came to BLISS independently. There was pressure from the ultrix
team to port applications to C, however is was decided that C didn't have
the built-in language structure to achieve the system independence that
BLISS gave them

The BLISS I've seen is very explicit about it's types and behavior expected
of the types. I believe (and I am not a bliss expert) that there was an
architecture file for each system and a data type definitions (some
standard, some application-specifiec) that specified expected behavior, such
as size, overflow, endian-ness, etc. One of the jobs the BLISS compiler had
was to make sure that data types used on an architecture aligned with what
the application was expecting.
Not generally, because

* || is defined to be short-circuiting, so it MUST NOT evaluate
b unless a is false.

this would only apply if the b portion was more than testing an int (which
in the case I used is all it is). It is only actually important if b is
something with a side effect, such as an assignment, auto-increment or
function call.
* a and b need not be integer types.

true, but it is what I'm giving as an example of the normalization.
And of course the result of the operation is defined as being 0 or 1.

I think only after C89, I can remember that being one of the points that
was being argued over in the pre-standardisation process.
I don't see this as "normalisation", because there is no intermediate
step which does a bit-wise or.

c = (a || b)

is semantically exactly equivalent to

c = (a != 0 ? 1
: (b != 0 ? 1
: 0))

It is not equivalent to

c = ((a | b) != 0 ? 1 : 0)

I understand what you're saying, however if you are instead thinking in the
mindset of a bool being an int that is zero/not zero it actually is. This
is part of the reason for that particular standardisation argument, some of
the participants didn't get the prevailing mentality of the language at that
time.

It'd be interesting to get clc archives from 86-89 to see what was being
said at the time. I will try to find my K&R1, which I'm fairly sure said
nothing about normalisation.
(Of course in some situations an optimizing compiler may determine that
it is equivalent in this specific situation and produce the same code)

In fact I think you'd find that this was the default way that a lot of the
compilers would have compiled an integer logical or operation in those days.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top