return -1 using size_t???

K

Keith Thompson

Malcolm McLean said:
No, there isn't.

I don't recall the details from that thread.  The point is that size_t
simply cannot hold a negative number.

Here it is
[ from Ben ]
size_t i = strlen(s);
while (i-- > 0 && isspace(s))
/* do nothing */;
s[i + 1] = '\0';

If the string is all spaces, or empty, I'd say that i goes to -1.


No, it goes to SIZE_MAX; it cannot represent the value -1.
 
E

Eric Sosman

Thank you for the clear explanation. I will not use a size_t to hold a
negative number.

This explanation may seem clear but it is also WRONG!

Although I see no flaws, it's possible I've overlooked something.
I'll admit to a certain breeziness and glossing over of niggling
detail in the interest of clarity, but I feel that's acceptable and
that any such hand-waving is far from WRONG.
It is assuming that the CPU is 2-s compliment.

No, nor does it assume two's complEment.
In many common cases eg on x86, behavior will be as Eric Sossman
describes.

Correct (except that your S key sticks). "Many common cases"
is a subset of "all cases," so a condition that holds for the latter
also holds for the former.
However, this is not guaranteed and not portable.

True: The behavior is not portable to a system lacking C.
On other
platforms it could trigger an overflow signal from the ALU or lead to a
Trap Representation. In fact, Undefined Behavior means that anything
could happen! Including (in theory) reformatting your hard disk.

Only if the platform runs C code on a non-C implementation. If
it uses a C implementation there is an implementation-defined aspect
to the conversion (the maximum size_t value is implemenation-dependent),
but there is no undefined behavior, no trap representation, and no
reformatting of your disk, whatever its hardness.
A size_t is an UN-SIGNED type, trying to store a negative number in it is
a programmer error generating an undefined behavior.

Alas, you are WRONG. A few references for remedial review:

6.2.5p9: "[...] A computation involving unsigned operands can
never overflow, because a result that cannot be represented by
the resulting unsigned integer type is reduced modulo the
number that is one greater than the largest value that can
be represented by the resulting type."

6.2.6.2p1, footnote 53: "[...] no arithmetic operation on
valid values can generate a trap representation other than
as part of an exceptional condition such as an overflow, and
this cannot occur with unsigned types."

6.3.1.3p2: "[converting out-of-range integers ...] if the
new type is unsigned, the value is converted by repeatedly
adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the
range of the new type."

If you believe these three citations are WRONG, I encourage you
to file a defect report with ISO.
 
K

Keith Thompson

Devil with the China Blue Dress said:
wrote: [...]
For all conforming C implementations the result of converting -1 to
size_t is SIZE_MAX.

In 4-bit ones complement, -1 is 1110, but the maximum natural number
is 1111; signed magnitude are 1001 and 1111; twos complement are 1111
and 1111.

Yes, in 4-bit ones'-complement -1 is represented by the bit pattern 1110,
but 1111 would be the representation for negative zero.
Which came first? Twos complement DEC, Honeywell, and IBM machines or
the C standard which emulated C implementations on twos complement
CPUs? Or did the committee just magically pluck that requirement out
of the air?

Clearly those machines that use[d] a two's-complement
representation for signed types predate the C standard. C's rules
for signed-to-unsigned conversion are easiest to implement on a
system that uses two's-complement, and require some extra work on
ones'-complement and sign-and-magnitude systems (which are rare
these days).

It seems clear that two's-complement semantics inspired standard C
semantics. Nevertheless, C's conversion semantics are now *defined*
without reference to the underlying representation.

Which raises a perhaps interesting question. Are there any current C
implementations for ones'-complement or sign-and-magnitude systems?
If so, what standard, if any, do they conform to, and how much
extra work do the compilers actually do to conform to the required
semantics for signed-to-unsigned conversion? For that matter,
are there non-conforming C implementations for such systems that
implement such conversions just by reinterpreting the bits?
 
J

Jeroen Mostert

It seems clear that two's-complement semantics inspired standard C
semantics. Nevertheless, C's conversion semantics are now *defined*
without reference to the underlying representation.

Which raises a perhaps interesting question. Are there any current C
implementations for ones'-complement or sign-and-magnitude systems?

Let's not stop there -- I'd be interested in a list of C implementations,
current *and* historical, for architectures that use one's complement or
sign-and-magnitude. Both of them were already in the minority by the time C
rolled around.

The only platform I've been able to find is Univac 1100, which was ones'
complement and presumably had a C compiler (since it had a Unix port). Of
course, all of this predates the ANSI standard by a good deal and it may
have well predated K&R C. Pre-K&R C didn't even have "unsigned", so
presumably the representation and interpretation of integers was all up for
grabs.
 
J

Jens Gustedt

Am 02/12/2012 06:21 PM, schrieb Devil with the China Blue Dress:
[QUOTE="Jens Gustedt said:
No, the question was about casting an signed number (-1) to an unsigned
type.
That this can be done and is closed under addition is a property of twos
complement integers.

No, the representation of the -1 has nothing do with the result of
converting it into any unsigned integer type. As has been already been
repeated several times upthread, C strictly defines what has to be done
when converting a negative *value* to an unsigned integer type. And in
particular it uses the value and not the representation.

For all conforming C implementations the result of converting -1 to
size_t is SIZE_MAX.

In 4-bit ones complement, -1 is 1110, but the maximum natural number is 1111;
signed magnitude are 1001 and 1111; twos complement are 1111 and 1111.[/QUOTE]

Frankly, I don't see the relation between your sentence and the one from
me that that you are citing above. As I said conversion from signed to
unsigned has by definition of the standard nothing to do with the
representation of the number but just by the value of it.

Jens
 
E

Eric Sosman

[QUOTE="Jens Gustedt said:
<[email protected]>
wrote:
[...]
For all conforming C implementations the result of converting -1 to
size_t is SIZE_MAX.

In 4-bit ones complement, -1 is 1110, but the maximum natural number is
1111;
signed magnitude are 1001 and 1111; twos complement are 1111 and 1111.

Frankly, I don't see the relation between your sentence and the one from
me that that you are citing above. As I said conversion from signed to
unsigned has by definition of the standard nothing to do with the
representation of the number but just by the value of it.

Thank heavens we sell software to France instead of buying it.[/QUOTE]

Jens is right. You are wrong. You are also boorish. Shut up.
 
B

Ben Bacarisse

Keith Thompson said:
Which raises a perhaps interesting question. Are there any current C
implementations for ones'-complement or sign-and-magnitude systems?
If so, what standard, if any, do they conform to, and how much
extra work do the compilers actually do to conform to the required
semantics for signed-to-unsigned conversion? For that matter,
are there non-conforming C implementations for such systems that
implement such conversions just by reinterpreting the bits?

There is one that does both (selectable as an option). I just happened
to be reading a Unisys C manual[1] recently. It's part of MCP: Unisys's
OS that runs on big machines derived from a Burroughs architecture. It
has 48-bit words and 8-bit bytes.

sizeof char/unsigned char/signed char is 1
sizeof long double is 12
sizeof everything else is 6

Despite having 48 bits, all integer types have 39 value bits. Signed
types have, in addition, a sign bit (presumably the 40th). The
remaining bits are presumably related to the tagged architecture that
Burroughs machines have always had and, as far as C goes, will be
padding bits. These machine will definitely have trap-representations,
since some sets of these tag bits are designed to cause traps in certain
situations.

The C implementation has INT_MAX == UINT_MAX and INT_MIN = -INT_MAX and
uses sign-magnitude representation. It supports ANSI C and it documents
the standard conversion from signed to unsigned types but, later on, in
a section on porting software it says:

Operations on unsigned integer types are more expensive than on signed
types. The $RESET PORT (UNSIGNED) option makes unsigned equivalent to
signed types and should be used on programs that do not depend upon
the wraparound or bit operation properties of unsigned types.

The document for this options says:

The UNSIGNED option controls the semantics of the sign attribute for
integer types. If this option is disabled, unsigned integers are
treated as signed integers; that is, normal signed-magnitude
arithmetic is performed. If the option is enabled, unsigned integers
are emulated as two’s- complement quantities. Enabling this option
could slow down performance and should be used only when absolutely
necessary.

For those who wonder about all the oddities in the standard, here is a
system that (a) uses EBCDIC[2]; (b) has unusual sizes and limits; (c) is
not two's complement; (d) has padding bits in all integer types except
the char types; (e) has trap representations in these types; and (f) has
multiple pointer representations (so pointer casts often generate code).
What's more it's *current*: there was recently an MCP 13.0 release with
support into 2013.

Given that the C standard has usually avoided mandating behaviour that
is expensive on some hardware, the behaviour of unsigned is an anomaly.
There was, however, little option since K&R mandated that behaviour long
before ANSI C.

[1] http://public.support.unisys.com/aseries/docs/clearpath-mcp-13.0/pdf/86002268-204.pdf

[2] Though I think you can switch to ASCII if your program needs it.
 
J

Jeroen Mostert

Keith Thompson said:
Which raises a perhaps interesting question. Are there any current C
implementations for ones'-complement or sign-and-magnitude systems?
If so, what standard, if any, do they conform to, and how much
extra work do the compilers actually do to conform to the required
semantics for signed-to-unsigned conversion? For that matter,
are there non-conforming C implementations for such systems that
implement such conversions just by reinterpreting the bits?

There is one that does both (selectable as an option). I just happened
to be reading a Unisys C manual[1] recently. It's part of MCP: Unisys's
OS that runs on big machines derived from a Burroughs architecture. It
has 48-bit words and 8-bit bytes.
For those who wonder about all the oddities in the standard, here is a
system that (a) uses EBCDIC[2]; (b) has unusual sizes and limits; (c) is
not two's complement; (d) has padding bits in all integer types except
the char types; (e) has trap representations in these types; and (f) has
multiple pointer representations (so pointer casts often generate code).
What's more it's *current*: there was recently an MCP 13.0 release with
support into 2013.
It's like discovering the coelacanth in your bathtub. For Christmas, I'm
totally buying one of these instead of a second DeathStation 9000 (I may try
networking it with my original DS9K).

The most interesting thing is that recent generations of the hardware are
based on Intel processors, meaning that all this hoopla is emulated and kept
around for legacy software. The companies who use these systems must be at
least as suitable for historical studies as the system itself.
 
I

Ike Naar

There is one that does both (selectable as an option). I just happened
to be reading a Unisys C manual[1] recently. It's part of MCP: Unisys's
OS that runs on big machines derived from a Burroughs architecture. It
has 48-bit words and 8-bit bytes.

sizeof char/unsigned char/signed char is 1
sizeof long double is 12
sizeof everything else is 6

Despite having 48 bits, all integer types have 39 value bits. Signed
types have, in addition, a sign bit (presumably the 40th). The
remaining bits are presumably related to the tagged architecture that
Burroughs machines have always had and, as far as C goes, will be
padding bits. These machine will definitely have trap-representations,
since some sets of these tag bits are designed to cause traps in certain
situations.

Having worked with these systems in the early 1980's, I remember
that integers are just floating point numbers where the 8-bit exponent
contains all zeroes, and the (1+39)-bit mantissa holds the value.
The four tag bits (3-bit tag plus 1-bit parity) are not visible to the
programmer. So the hardware actually uses 52-bit words.
 
N

Nick Keighley

CDC were all ones complements until Cyber 180 and 200, but C is one of the few
languages not ported. There was SYMPL based on Jovial based on Algol 58 and had
some similarities to B.

ICL 1900 series were 1s complment. But I can find no sign that tehre
was a C compiler for them.
 
A

Angel

For those who wonder about all the oddities in the standard, here is a
system that (a) uses EBCDIC[2]; (b) has unusual sizes and limits; (c) is
not two's complement; (d) has padding bits in all integer types except
the char types; (e) has trap representations in these types; and (f) has
multiple pointer representations (so pointer casts often generate code).
What's more it's *current*: there was recently an MCP 13.0 release with
support into 2013.

Just out of curiosity, is there any change that this system can run
GNU/Linux?
 
B

Ben Bacarisse

Ike Naar said:
There is one that does both (selectable as an option). I just happened
to be reading a Unisys C manual[1] recently. It's part of MCP: Unisys's
OS that runs on big machines derived from a Burroughs architecture. It
has 48-bit words and 8-bit bytes.

sizeof char/unsigned char/signed char is 1
sizeof long double is 12
sizeof everything else is 6

Despite having 48 bits, all integer types have 39 value bits. Signed
types have, in addition, a sign bit (presumably the 40th). The
remaining bits are presumably related to the tagged architecture that
Burroughs machines have always had and, as far as C goes, will be
padding bits. These machine will definitely have trap-representations,
since some sets of these tag bits are designed to cause traps in certain
situations.

Having worked with these systems in the early 1980's, I remember
that integers are just floating point numbers where the 8-bit exponent
contains all zeroes, and the (1+39)-bit mantissa holds the value.

Ah, that makes sense. There were, I think, CDC machines that did that
too. Do you know what happened when exponent bits got set (and indeed
if they could get set) by operations like ~?
 
R

Ralph Spitzner

Eric Sosman wrote:
[...]
all bits set, and interpreted that as the int value minus one. What
happened to the other half remains shrouded in mystery: Maybe it
just sat in a CPU register and was ignored, maybe it hung around in
a stack slot ready to confuse the next conversion, maybe it donated
your bank balance to the Greek bailout fund.

Can you rephrase/downsize that statement, so we all could use it
as a signature ?

:p

-rasp
 
B

Ben Bacarisse

Angel said:
For those who wonder about all the oddities in the standard, here is a
system that (a) uses EBCDIC[2]; (b) has unusual sizes and limits; (c) is
not two's complement; (d) has padding bits in all integer types except
the char types; (e) has trap representations in these types; and (f) has
multiple pointer representations (so pointer casts often generate code).
What's more it's *current*: there was recently an MCP 13.0 release with
support into 2013.

Just out of curiosity, is there any change that this system can run
GNU/Linux?

It would surely be hard and it might be impossible, but why would you?
The only reason to use the clunky system is that you have old software
that requires this environment.

As someone pointed out, the modern versions of this architecture are
built using Intel Xeon CPUs, so if you had one of these machines and
needed GNU/Linux you'd start by throwing all the MCP emulation away.
 
A

Angel

Angel said:
For those who wonder about all the oddities in the standard, here is a
system that (a) uses EBCDIC[2]; (b) has unusual sizes and limits; (c) is
not two's complement; (d) has padding bits in all integer types except
the char types; (e) has trap representations in these types; and (f) has
multiple pointer representations (so pointer casts often generate code).
What's more it's *current*: there was recently an MCP 13.0 release with
support into 2013.

Just out of curiosity, is there any change that this system can run
GNU/Linux?

It would surely be hard and it might be impossible, but why would you?
The only reason to use the clunky system is that you have old software
that requires this environment.

I don't need reasons per se. I run Linux on old clunky stuff because I
can. :)
As someone pointed out, the modern versions of this architecture are
built using Intel Xeon CPUs, so if you had one of these machines and
needed GNU/Linux you'd start by throwing all the MCP emulation away.

That would make it just Linux on Intel. Been there, done that. :)

Like I said, I was just curious. I've got Linux running on Intel, Sparc
and PowerPC, I'm always looking for a new challenge. :)
 
J

James Kuyper

Angel said:
For those who wonder about all the oddities in the standard, here is a
system that (a) uses EBCDIC[2]; (b) has unusual sizes and limits; (c) is
not two's complement; (d) has padding bits in all integer types except
the char types; (e) has trap representations in these types; and (f) has
multiple pointer representations (so pointer casts often generate code).
What's more it's *current*: there was recently an MCP 13.0 release with
support into 2013.

Just out of curiosity, is there any change that this system can run
GNU/Linux?

It would surely be hard and it might be impossible, but why would you?
The only reason to use the clunky system is that you have old software
that requires this environment.

I thought you said this was "current"? That's a rather less impressive
example if it is indeed a clunky old machine, rather than one that is
still in sufficiently active use to require new supported releases.
 
J

Joe Pfeiffer

Ben Bacarisse said:
Ike Naar said:
There is one that does both (selectable as an option). I just happened
to be reading a Unisys C manual[1] recently. It's part of MCP: Unisys's
OS that runs on big machines derived from a Burroughs architecture. It
has 48-bit words and 8-bit bytes.

sizeof char/unsigned char/signed char is 1
sizeof long double is 12
sizeof everything else is 6

Despite having 48 bits, all integer types have 39 value bits. Signed
types have, in addition, a sign bit (presumably the 40th). The
remaining bits are presumably related to the tagged architecture that
Burroughs machines have always had and, as far as C goes, will be
padding bits. These machine will definitely have trap-representations,
since some sets of these tag bits are designed to cause traps in certain
situations.

Having worked with these systems in the early 1980's, I remember
that integers are just floating point numbers where the 8-bit exponent
contains all zeroes, and the (1+39)-bit mantissa holds the value.

Ah, that makes sense. There were, I think, CDC machines that did that
too. Do you know what happened when exponent bits got set (and indeed
if they could get set) by operations like ~?

Well, sort-of. They did use a representation in which having the sign
and exponent bits all 0 or all 1 would give the same result as an
integer, but they had separate integer and floating point instructions
so an integer add (for instance) could use all 60 bits for value.

The one exception to that rule was multiplication: there were only
floating point multiplication instructions, but the "double precision"
multiply would correctly multiply two integers, provided the result fit
in 48 (iirc) bits.

I've come across the claim many times that this was in fact an accident
that wasn't noticed until the machine was in production, but haven't
come across any substantiation of it. I don't think Thornton's book
mentions that you can do this (but could easily be mis-remembering).
 
B

Ben Bacarisse

James Kuyper said:
Angel said:
For those who wonder about all the oddities in the standard, here is a
system that (a) uses EBCDIC[2]; (b) has unusual sizes and limits; (c) is
not two's complement; (d) has padding bits in all integer types except
the char types; (e) has trap representations in these types; and (f) has
multiple pointer representations (so pointer casts often generate code).
What's more it's *current*: there was recently an MCP 13.0 release with
support into 2013.

Just out of curiosity, is there any change that this system can run
GNU/Linux?

It would surely be hard and it might be impossible, but why would you?
The only reason to use the clunky system is that you have old software
that requires this environment.

I thought you said this was "current"? That's a rather less impressive
example if it is indeed a clunky old machine, rather than one that is
still in sufficiently active use to require new supported releases.

I did. It is current. It has supported releases (the latest into 2013
at least). Is that inconsistent with it being a "clunky old system"? (I
didn't say "machine".)
 
J

James Kuyper

I did. It is current. It has supported releases (the latest into 2013
at least). Is that inconsistent with it being a "clunky old system"? (I
didn't say "machine".)

I wouldn't call it "current" if it's only used by those with a pressing
need to run "old software". One term I've often heard used for such
systems is "legacy".

I own an old program called MaxThink that I'd love to be able to run
that requires (IIRC) MS-DOS 2.2 - it had some features that no more
modern software I've seen matches, and I have a lot of old files stored
in MaxThink format. If I still owned an MS-DOS 2.2 system I could run it
on, I wouldn't call it a "current" system, even if MS choose to still
provide support for such systems.

Of course, the original disks became corrupt even before I got rid of my
last floppy disk drive. It wasn't copy protected, and I made working
copies to several different locations, but all of those copies are
currently inaccessible for one reason or another, and the relevant
backup disks are unreadable. Therefore, even if I could copy it to an
emulated MS-DOS environment (I did once have it working under dosemu),
it wouldn't do me any good. Very annoying.
 
K

Keith Thompson

William Ahern said:
Ben Bacarisse said:
Angel said:
For those who wonder about all the oddities in the standard, here is a
system that (a) uses EBCDIC[2]; (b) has unusual sizes and limits; (c) is
not two's complement; (d) has padding bits in all integer types except
the char types; (e) has trap representations in these types; and (f) has
multiple pointer representations (so pointer casts often generate code).
What's more it's *current*: there was recently an MCP 13.0 release with
support into 2013.

Just out of curiosity, is there any change that this system can run
GNU/Linux?
It would surely be hard and it might be impossible, but why would you?
The only reason to use the clunky system is that you have old software
that requires this environment.

The OpenBSD team diligently maintains their VAX port just to keep themselves
honest. One of their rationalizations is that supporting older architectures
makes it easier to adapt to future architectures by exposing faulty
assumptions in the code.

Good point -- but the VAX architecture is much closer to most modern
systems than the one Ben describes: 32-bit int with no padding bits, all
pointers are 32 bits with the same representation, ASCII character set.
(VMS differs quite a bit from POSIX systems, but (a) not as much as some
other operating systems, and (b) that's not relevant for OpenBSD).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,527
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top