Restricted unsigned integer range would have been better?

Ansel · Jul 23, 2012

If C's unsigned integer types were designed so that their maximum value was
the same as the absolute value of the same width signed integer type's
minimum value, wouldn't that automagically elimintate a bunch of erroneous
code and simplify programming in C?

Nobody · Jul 23, 2012

If C's unsigned integer types were designed so that their maximum value
was the same as the absolute value of the same width signed integer type's
minimum value, wouldn't that automagically elimintate a bunch of erroneous
code

Maybe, although it would probably introduce at least as much erroneous
code as it eliminated.

and simplify programming in C?

No.

Andre · Jul 23, 2012

Maybe, although it would probably introduce at least as much erroneous
code as it eliminated.

At least in 2038...
http://en.wikipedia.org/wiki/Year_2038_problem

BartC · Jul 23, 2012

Ansel said:
If C's unsigned integer types were designed so that their maximum value
was
the same as the absolute value of the same width signed integer type's
minimum value, wouldn't that automagically elimintate a bunch of erroneous
code and simplify programming in C?

Assuming an 8-bit int type, the normal signed/unsigned ranges might be -128
to +127, and 0 to 255.

You're suggesting the unsigned range ought to be 0 to 128?

That sounds very difficult to implement; what happens if you have 128
unsigned, and add one? Every arithmetic operation would need extra code to
detect the overflow, and convert numbers such as +129 to 0. With a few
special cases such as 128+128.

You would also be wasting most of the top bit (only used when representing
128). Then there are the illegal values 129 to 255, which uninitialised data
is likely to have.

And then, with the char type, you would also be limited to a maximum of
+128, cutting out most of the extra ANSI characters.

In fact, every piece of coding relying on a byte having a range of 0 to 255
(as used in much of the internet for example), would need rewriting to use a
wider int to cover that range.

And what happens when, for example, reading a file of 8-bit characters which
have been created when the range allowed +130 to 255 (eg. much of the
world's stored data); you would now have a block of illegal values instead.

So I can see a few snags with the idea.

Ansel · Jul 24, 2012

BartC said:
Assuming an 8-bit int type,

I was actually thinking about every width *except* 8-bits when I posted.

the normal signed/unsigned ranges might be -128
to +127, and 0 to 255.

You're suggesting the unsigned range ought to be 0 to 128?

That would be the scene with *your* 8-bit int type, yes. And I wasn't
"suggesting", I was wondering.

That sounds very difficult to implement;

It's quite simple actually.

what happens if you have 128
unsigned, and add one?

That depends on the type of integer: saturating or modular or
except-on-overflow. Note that my OP was basically referring to a "ranged",
or "range-restricted", integer.

Every arithmetic operation would need extra code

Indeed it would, but it is minor and in most cases not detriental and it
wouldn't preclude having another integer type called "fast-and-less-safe
integer" or something. I.e., should the concept pan out to be worthwhile.
And again, I wasn't *suggesting* with my OP, rather just throwing a thought
on the table for discussion.

detect the overflow, and convert numbers such as +129 to 0. With a few
special cases such as 128+128.

You would also be wasting most of the top bit (only used when representing
128). Then there are the illegal values 129 to 255, which uninitialised
data
is likely to have.

And then, with the char type, you would also be limited to a maximum of
+128, cutting out most of the extra ANSI characters.

In fact, every piece of coding relying on a byte having a range of 0 to
255
(as used in much of the internet for example), would need rewriting to use
a wider int to cover that range.

Noted above. For 'char' and 'byte' types allow the full unsigned range.
Probably, for all unsigned ints that are not arithmetic types.

And what happens when, for example, reading a file of 8-bit characters
which
have been created when the range allowed +130 to 255 (eg. much of the
world's stored data); you would now have a block of illegal values
instead.

1. My scenario was if it had been that way from the start, would we be
better off.
2. Non-arithmetic types could use the full range that 8-bits allow.

So I can see a few snags with the idea.

Name one.

Ansel · Jul 24, 2012

I'm curious to see an example, which you surely will post.

Well the obvious one is that you could assign the negative of any unsigned
int to an int without a care. It just seems conceptually simpler: less
thinking about trivial details in the solution domain required. Maybe not
overall though -- I'd have to think about it some more. I'd bet no one even
ever considered that way back when. Rather, they probably just thought,
"there are 8-bits, so we *must* utilize them". Conceptual simplicity
probably never entered anyone's mind. The hardware drove the design. Or so I
wonder.

BartC · Jul 24, 2012

Ansel said:
I was actually thinking about every width *except* 8-bits when I posted.

I chose 8-bits to simplify the examples. But, what's special about 8-bits
that they ought to be allowed a conventional unsigned range; perhaps it's
easier to think of examples where things would stop working..?

That would be the scene with *your* 8-bit int type, yes. And I wasn't
"suggesting", I was wondering.

OK, choosing a 16-bit type, you were wondering whether the unsigned range
ought to be 0 to 32768 instead of 0 to 65535.

I think it would be simpler to restrict it to 0 to 32767, ie just the
positive values of a signed int type. (That means you can't
represent -(-32768), but you can't do that as a signed value anyway.) Then
you just have to mask the top bit in every operation.

But then, that would be little different from not having unsigned at all,
and and only using the top half of each signed int range.

It's quite simple actually.

That depends on the type of integer: saturating or modular or
except-on-overflow. Note that my OP was basically referring to a "ranged",
or "range-restricted", integer.

There is some merit to ranged-integer types, but the range would need to be
enforced (not popular in C) and the type system becomes massively more
complicated (what's the result of adding a 100..200 type to a 150..250
type?).

Also, a range is usually a subset of the allowed int range; yours would be a
superset! Because the top value will be one more than the normal range of a
signed int.

It *might* work as a range of the full unsigned int type, but that requires
the normal, unconstrained unsigned type to be still available.

Otherwise there are just two many problems, if any arbitrary value of 16, 32
or 64 bits you might encounter, can only ever be represented as signed, so
will be negative half the time. You can't even convert -1 to unsigned, and
back again, as C can now; you can't represent -1 as unsigned at all.

Malcolm McLean · Jul 24, 2012

×‘×ª××¨×™×š ×™×•× ×©×œ×™×©×™, 24 ×‘×™×•×œ×™ 2012 10:43:52 UTC+1, ×ž××ª Bart:

I chose 8-bits to simplify the examples. But, what's special about 8-bits
that they ought to be allowed a conventional unsigned range; perhaps it's
easier to think of examples where things would stop working..?

You commonly use 8 bit unsigneds to pull out bit strings from an input channel.

Also, as ranges get smaller, you're more likely to use the upper half of the range. Plenty of people use the range 0-255 to represent a colour channelintensity value, or to index into a table of glyphs. However there's not going to be much difference visually between 32 thousand or 64 thousand greyscale values if you go to 16 bit colour channels. Similarly it's unlikely that you have a need for between 32,000 and 64,000 glyphs.

Ben Bacarisse · Jul 24, 2012

Ansel said:
Well the obvious one is that you could assign the negative of any unsigned
int to an int without a care. It just seems conceptually simpler: less
thinking about trivial details in the solution domain required. Maybe not
overall though -- I'd have to think about it some more.

I'm curious to see an example.

<snip>

BartC · Jul 24, 2012

Malcolm McLean said:
×‘×ª××¨×™×š ×™×•× ×©×œ×™×©×™, 24 ×‘×™×•×œ×™ 2012 10:43:52 UTC+1, ×ž××ª Bart:

Also, as ranges get smaller, you're more likely to use the upper half of
the range.

Similarly it's unlikely that you have a need for between 32,000 and 64,000
glyphs.

Unicode seems to think it needs that many (more in fact).

But, any time you want to treat two 8-bit bytes as a single 16-bit unit, or
four as 32-bit, it would be crazy not to have a suitable type to represent
that, without having to treat half the possible values as negative numbers.
How would you right-shift the result for example?

Malcolm McLean · Jul 24, 2012

×‘×ª××¨×™×š ×™×•× ×©×œ×™×©×™, 24 ×‘×™×•×œ×™ 2012 12:04:42 UTC+1, ×ž××ª Bart:

Malcolm McLean;[email protected]> wrote in message

Unicode seems to think it needs that many (more in fact).

Unicode has 110,000 characters at last count. So 16 bit unsigneds aren't really helpful. You might want a subset of Unicode that has more than 256 characters, but the number of sitations where you want a subset with more than32,000, less than 64,000, and you don't want to just go to 32 bits in caseof future extensions is limited.

BartC · Jul 24, 2012

Malcolm McLean said:
×‘×ª××¨×™×š ×™×•× ×©×œ×™×©×™, 24 ×‘×™×•×œ×™ 2012 12:04:42 UTC+1, ×ž××ª Bart:
Unicode has 110,000 characters at last count. So 16 bit unsigneds aren't
really helpful. You might want a subset of Unicode that has more than 256
characters, but the number of sitations where you want a subset with more
than 32,000, less than 64,000, and you don't want to just go to 32 bits in
case of future extensions is limited.

What you're saying is that you could live with a set of numeric types where
these ranges:

16-bit: 32768 to 65535 (32769 to 65535 in OP's scheme)
32-bit: 2 billion to 4 billion approx
64-bit: 9 to 18 trillion approx

simply don't exist? At least, without representing them with a bit-pattern
normally used for negative values, or having to use the next wider type.

That seems too restrictive at this level of language, where you don't have
recourse to anything lower level without going into machine code.

BartC · Jul 24, 2012

BartC said:
32-bit: 2 billion to 4 billion approx
64-bit: 9 to 18 trillion approx

'Trillion' here has the old British meaning, 1e18. I don't know how many
zillions that would be now.

Malcolm McLean · Jul 24, 2012

Malcolm McLean (e-mail address removed)> wrote in message

What you're saying is that you could live with a set of numeric types
where these ranges:

16-bit: 32768 to 65535 (32769 to 65535 in OP's scheme)
32-bit: 2 billion to 4 billion approx
64-bit: 9 to 18 trillion approx

simply don't exist?

Numbers which can go over 32768 but never over 65535 can't be represented efficiently. That's bearable. Normally if a number can go over 32768 you'll want a bigger safety margin than 65535. For instance carbon 14 dates can currently be pushed back to about 50,000 years ago. So you could represent them as 16 bit unsigned ints. But you'd be foolish to do so, because the software will break if there's an improvement in the technology.

Ansel · Jul 24, 2012

BartC said:
I chose 8-bits to simplify the examples. But, what's special about 8-bits
that they ought to be allowed a conventional unsigned range; perhaps it's
easier to think of examples where things would stop working..?

It's hard to let go of an integer of that range. We've already established,
though, sort of, that non-arithmetic types could still utilize the full
range -- they'd just be called something else. IOW, the smaller range of an
8-bit type makes is more of a rare jewel because there is a higher
probability of needing "just a little more range than 128" than of there is
similarly for wider types. (Or something like that, LOL).

OK, choosing a 16-bit type, you were wondering whether the unsigned range
ought to be 0 to 32768 instead of 0 to 65535.

I think it would be simpler to restrict it to 0 to 32767, ie just the
positive values of a signed int type.
Fine.

(That means you can't
represent -(-32768), but you can't do that as a signed value anyway.) Then
you just have to mask the top bit in every operation.

But then, that would be little different from not having unsigned at all,
and and only using the top half of each signed int range.

There is some merit to ranged-integer types, but the range would need to
be
enforced (not popular in C) and the type system becomes massively more
complicated

"massively"? Anyway, having the flexibility* of the different types would be
worth the bit of added complexity (now we're getting off topic) and, indeed,
there is some standard for C to address at least saturating types as I
recall. Why stop at just that?

(what's the result of adding a 100..200 type to a 150..250
type?).

It depends on the type. Don't mix different types. But when you do, let the
compiler coerce as appropriate and safe and least unexpecting behavior or
emit a compile-time error.

Also, a range is usually a subset of the allowed int range; yours would be
a
superset! Because the top value will be one more than the normal range of
a
signed int.

You are picking nits -- you *know* that is not what I meant in the OP.

Ansel · Jul 24, 2012

BartC said:
Unicode seems to think it needs that many (more in fact).

But, any time you want to treat two 8-bit bytes as a single 16-bit unit,
or
four as 32-bit, it would be crazy not to have a suitable type to represent
that, without having to treat half the possible values as negative
numbers.
How would you right-shift the result for example?

The point is that it is more likely to be a "damn, I really needed that
extra range" situation with 8-bits than for wider widths. Indeed, most of
the time you'd probably choose the 32-bit or 64-bit type rather than that
"in-betweener" 16-bit type anyway, as you either really need the space
efficiency or you go to the "platform word" type.

Ansel · Jul 24, 2012

Ben Bacarisse said:
I'm curious to see an example.

I was wondering. If you say "it ain't so", I will not "just believe you",
because I actually want to *know* "it ain't so". I am seeking an
analyis/discussion.

Ansel · Jul 24, 2012

The whole concept is unworkable, because you are missing out one of the
biggest uses of C. /You/ may be able to live without full-range unsigned
8-bit, 16-bit and 32-bit types for /your/ programming. But the real world
runs on embedded systems programmed in C - these far outweigh the number
of PC-style systems. And for embedded systems, unsigned data is the norm,
using exactly the number of bits provided by the hardware.

You can still have the full-width unsigned types, they just wouldn't be
*arithmetic* types. I don't know if that would work out for embedded
systems. It should be adequate though for, say, representation of character
types. That noted, still "unworkable"?

Ben Bacarisse · Jul 25, 2012

Ansel said:
I was wondering. If you say "it ain't so", I will not "just believe you",
because I actually want to *know* "it ain't so". I am seeking an
analyis/discussion.

I thought that maybe you'd concluded that your suggestion would
"simplify programming in C" because you'd worked through some examples,
examples you could then share. If not, no worries. I just like to know
what lies behind such posts because it helps me know how seriously to
take them.

Ansel · Jul 25, 2012

Ben Bacarisse said:
I thought that maybe you'd concluded that your suggestion would
"simplify programming in C" because you'd worked through some examples,
examples you could then share. If not, no worries. I just like to know
what lies behind such posts because it helps me know how seriously to
take them.

Oh, no -- I was just "throwing it out there" for the gurus, so as to not
waste anymore time on it than necessary. It was more like, "hey, did anyone
ever/even consider this?", because I assert that just having the hardware
there could have block some trains of thought on the possibilities. I really
don't know, so I "threw it out there". I'm really not the one to the
potential use cases, and it's probably a team-thing anyway -- a la, this
newsgroup (!).

So far, we *know* full-range unsigneds are "very-nice-to-haves" (for
character types and such), but java seems to do alright without them. In the
cases where someone wants a full-range unsigned, like character types, are
those kinds of things always non-arithmetic types? If so, maybe then the
"half-range" arithmetic unsigned integer idea becomes plausible. Umm... I
seem to be moving toward nixing the unsigned arithmetic integer types
altogether, like java did, huh!

Unsigned integer wrapping on overflow	23	Aug 27, 2010
Standard integer types vs <stdint.h> types	163	Jan 17, 2008
[half OT] About the not-in-common range of signed and unsigned char	6	Jul 14, 2010
portability issues and limits.h	1	Sep 22, 2011
Integer promotion, relational operators, and unequal integer ranks.	1	Jan 15, 2007
signed integer overflow	27	Aug 13, 2005
Question re. integral promotion, signed->unsigned	9	Jan 30, 2004
Three questions about signed/unsigned type representations	8	Dec 4, 2004

Restricted unsigned integer range would have been better?

Ansel

Nobody

Andre

BartC

Ansel

Ansel

BartC

Malcolm McLean

Ben Bacarisse

BartC

Malcolm McLean

BartC

BartC

Malcolm McLean

Ansel

Ansel

Ansel

Ansel

Ben Bacarisse

Ansel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads