Intrinsic Minimums

J

JKop

I've been searching the Standard for info about the minimum
"bits" of the intrinsic types, but haven't been able to
find it. Could anyone please point me to it?

-JKop
 
A

Andre Kostur

JKop said:
I've been searching the Standard for info about the minimum
"bits" of the intrinsic types, but haven't been able to
find it. Could anyone please point me to it?

It's not defined. Best you've got is that sizeof(char) == 1, and sizeof
(short) <= sizeof(int) <= sizeof(long).

However, that's in bytes, not bits. It is implementation-defined as to how
many bits are in a byte. sizeof(int) is the "natural size suggested by the
architecture of the execution environment'. (Section 3.9)

And I think CHAR_BIT specifies the number of bits in a char... but that
appears to be defined in the C Standard (in <limits.h>)
 
J

John Harrison

It's not defined. Best you've got is that sizeof(char) == 1, and sizeof
(short) <= sizeof(int) <= sizeof(long).

However, that's in bytes, not bits. It is implementation-defined as to
how
many bits are in a byte. sizeof(int) is the "natural size suggested by
the
architecture of the execution environment'. (Section 3.9)

And I think CHAR_BIT specifies the number of bits in a char... but that
appears to be defined in the C Standard (in <limits.h>)

C requires short >= 16 bits, int >= 16 bits, long >= 32 bits. These
minimums are implied by the constraints given on INT_MIN, INT_MAX etc. in
<limits.h>. Presumably C++ inherits this from C.

john
 
J

JKop

John Harrison posted:

C requires short >= 16 bits, int >= 16 bits, long >= 32 bits. These
minimums are implied by the constraints given on INT_MIN, INT_MAX etc.
in
<limits.h>. Presumably C++ inherits this from C.

john


I'm writing a prog that'll use Unicode. To represent a
Unicode character, I need a data type that can be set to
65,536 distinct possible values; which in the today's world
of computing equates to 16 bits. wchar_t is the natural
choice, but is there any guarantee in the standard that'll
it'll be 16 bits? If not, then is unsigned short the way to
go?

This might sound a bit odd, but... if an unsigned short
must be atleast 16 bits, then does that *necessarily* mean
that it:

A) Must be able to hold 65,536 distinct values.
B) And be able to store integers in the range 0 -> 65,535 ?

Furthermore, does a signed short int have to be able to
hold a value between:

A) -32,767 -> 32,768

B) -32,768 -> 32,767

I've also heard that some systems are stupid enough (opps!
I mean poorly enough designed) to have two values for zero,
resulting in:

-32,767 -> 32,767


For instance, I don't care if some-one tells me it's 3
bits, just so long as it can hold 65,536 distinct values!


-JKop
 
J

JKop

I've just realized something:

char >= 8 bits
short int >= 16 bits
int >= 16 bits
long int >= 32 bits

And:

short int >= int >= long int


On WinXP, it's like so:

char : 8 bits
short : 16 bits
int : 32 bits
long : 32 bits


Anyway,

Since there's a minimum, why haven't they just be given definite values!
like:

char : 8 bits
short : 16 bits
int : 32 bits
long : 64 bits

or maybe even names like:

int8
int16
int32
int64

And so then if you want a greater amount of distinct possible values,
there'll be standard library classes. For instance, if you want a 128-Bit
integer, then you're looking for a data type that can store 3e+38 approx.
distinct values. Well... if a 64-Bit integer can store 1e+19 approx values,
then put two together and viola, you've got a 128-Bit number:

class int128
{
private:

int64 a;
int64 b;
//and so on
};

Or while I'm thinking about that, why not be able to specify whatever size
you want, as in:

int8 number_of_sisters;

int16 population_my_town;

int32 population_of_asia;

int64 population_of_earth;

or maybe even:

int40 population_of_earth;


Some people may find that this is a bit ludicrious, but you can do it
already yourself with classes: if you want a 16,384 bit number, then all you
need to do is:

class int16384
{
private:
unsigned char data[2048];

//and so on

};

Or maybe even be able to specify how many distinct possible combinations you
need. So for unicode:

unsigned int<65536> username[15];


This all seems so simple in my head - why can't it just be as so!


-JKop
 
A

Andre Kostur

JKop said:
I've just realized something:

char >= 8 bits
short int >= 16 bits
int >= 16 bits
long int >= 32 bits

And:

short int >= int >= long int


On WinXP, it's like so:

char : 8 bits
short : 16 bits
int : 32 bits
long : 32 bits

That's one platform. There are also platforms with 9 bit chars, and 36
bit ints..... (at least if I recall correctly, it was 36 bits...)
 
M

Mark A. Gibbs

JKop said:
I'm writing a prog that'll use Unicode. To represent a
Unicode character, I need a data type that can be set to
65,536 distinct possible values; which in the today's world
of computing equates to 16 bits. wchar_t is the natural
choice, but is there any guarantee in the standard that'll
it'll be 16 bits? If not, then is unsigned short the way to
go?

16 bits will always store 65,536 distinct values, regardless of what
day's world the programmer is living in, and regardless of how the
platform interprets those 65,536 values (eg, positive and negative 0).

as far as my reading goes there are no explicit guarantees for the size
of wchar_t. however, wchar_t will be (in essence) an alias for one of
the other integer types. what i am not sure of is whether or not
"integer types" includes any of the char derivatives. if not, then the
underlying type for wchar_t must be either short, int, or long, which
would therefore imply a minimum of 16 bits.

could someone confirm or deny my interpretation here?

now, on another less c++-y note, you have made the classic java
"misunderestimation" of unicode. unicode characters may require up to 32
bits, not 16
(http://www.unicode.org/standard/principles.html#Encoding_Forms). given
that gem, your *best* bet would appear to be not wchar_t, not short, but
*long*.

of course, you should have no problem extending the iostreams, strings,
etc. for the new character type ^_^. enjoy.

indi
 
O

Old Wolf

JKop said:
I'm writing a prog that'll use Unicode. To represent a
Unicode character, I need a data type that can be set to
65,536 distinct possible values;

There were more than 90,000 possible Unicode characters last
time I looked (there are probably more now).

If you use a 16-bit type to store this, you have to either:
- Ignore characters whose code is > 65535, or
- Use a multi-byte encoding such as UTF-16, and then all of
your I/O functions will have to be UTF-16 aware.
which in the today's world of computing equates to 16 bits.

A bit of mathematical thought will convince you that you need
at least 16 Binary digITs to represent 2^16 values.
wchar_t is the natural choice, but is there any guarantee
in the standard that'll it'll be 16 bits?

No, in fact it's unlikely to be 16 bit. It's only guaranteed to be
able to support "the largest character in all supported locales",
and locales are implementation-dependent, so it could be 8-bit on
a system with no Unicode support.

On MS windows, some compilers (eg. gcc) have 32-bit wchar_t and
some (eg. Borland, Microsoft) have 16-bit. On all other systems
that I've encountered, it is 32-bit.

This is quite annoying (for people whose language falls in the
over-65535 region especially). One can only hope that MS will
eventually come to their senses, or perhaps that someone will
standardise a system of locales.

If you want to write something that's portable to all Unicode
platforms, you will have to use UTF-16 for your strings,
unfortunately. This means you can't use all the standard library
algorithms on them. Define a type "utf16_t" which is an unsigned short.

The only other alternative is to use wchar_t and decode UTF-16
to plain wchar_t (ignoring any characters outside the range of
your wchar_t) whenever you receive a wchar_t string encoded as
UTF-16. (and don't write code that's meant to be used by Chinese).
This might sound a bit odd, but... if an unsigned short
must be atleast 16 bits, then does that *necessarily* mean
that it:

A) Must be able to hold 65,536 distinct values.
B) And be able to store integers in the range 0 -> 65,535 ?
Yes

Furthermore, does a signed short int have to be able to
hold a value between:

A) -32,767 -> 32,768

B) -32,768 -> 32,767
No

I've also heard that some systems are stupid enough (opps!
I mean poorly enough designed) to have two values for zero,
resulting in:

-32,767 -> 32,767

Yes (these are all archaic though, from a practical point of
view you can assume 2's complement, ie. -32768 to 32767).
FWIW the 3 supported systems are (for x > 0):
2's complement: -x == ~x + 1
1's complement: -x == ~x
sign-magnitude: -x = x & (the sign bit)
 
S

Sharad Kala

C requires short >= 16 bits, int >= 16 bits, long >= 32 bits. These
minimums are implied by the constraints given on INT_MIN, INT_MAX etc. in
<limits.h>. Presumably C++ inherits this from C.

Yes, that's correct.
At the end of section 18.2.2, there is a specific reference to ISO C
subclause
5.2.4.2.1. So this section is included by reference. This section gives
definition of CHAR_BIT, UCHAR_MAX etc.

-Sharad
 
J

John Harrison

This might sound a bit odd, but... if an unsigned short
must be atleast 16 bits, then does that *necessarily* mean
that it:

A) Must be able to hold 65,536 distinct values.
B) And be able to store integers in the range 0 -> 65,535 ?

Yes, USHRT_MIN must be at least 65535, and all unsigned types must obey
the laws of modulo 2-to-the-power-N arithmetic where N is the number of
bits. I think that implies that the minimum value is 0, and that all
values between 0 and 2 to-the-power-N - 1 must be represented.
Furthermore, does a signed short int have to be able to
hold a value between:

A) -32,767 -> 32,768

B) -32,768 -> 32,767

I've also heard that some systems are stupid enough (opps!
I mean poorly enough designed) to have two values for zero,
resulting in:

-32,767 -> 32,767

That's correct. I seriously doubt you would meet such a system in practise
(except in a museum).
For instance, I don't care if some-one tells me it's 3
bits, just so long as it can hold 65,536 distinct values!


-JKop

john
 
J

JKop

Mark A. Gibbs posted:
of course, you should have no problem extending the iostreams, strings,
etc. for the new character type ^_^. enjoy.

You're absolutley correct

basic_string<unsigned long> stringie;


-JKop
 
J

JKop

Old Wolf posted:
from a practical point of
view you can assume 2's complement, ie. -32768 to 32767).
FWIW the 3 supported systems are (for x > 0):
2's complement: -x == ~x + 1
1's complement: -x == ~x
sign-magnitude: -x = x & (the sign bit)

sS wouldn't that be -32,767 -> 32,768?

I assume that 1's compliment is the one that has both positive and negative
0.


As for the sign-magnitude thingie, that's interesting!

unsigned short blah = 65535;

signed short slah = blah;

slah == -32767 ? ?


-JKop
 
J

John Harrison

JKop said:
Mark A. Gibbs posted:


You're absolutley correct

basic_string<unsigned long> stringie;

I think you'll also need a char_traits class.

basic_string<unsigned long, ul_char_traits> stringie;

john
 
R

Rolf Magnus

JKop said:
John Harrison posted:




I'm writing a prog that'll use Unicode. To represent a
Unicode character, I need a data type that can be set to
65,536 distinct possible values;

No, you need more for full unicode support.
which in the today's world of computing equates to 16 bits. wchar_t is
the natural choice, but is there any guarantee in the standard that'll
it'll be 16 bits?

It doesn't need to be exactly 16 bit. It can be more. In g++, it's 32
bits.
If not, then is unsigned short the way to go?

This might sound a bit odd, but... if an unsigned short
must be atleast 16 bits, then does that *necessarily* mean
that it:

A) Must be able to hold 65,536 distinct values.
B) And be able to store integers in the range 0 -> 65,535 ?

It's actually rather the other way round. It must explicitly be able to
hold at least the range from 0 to 65535, which implies a minimum of 16
bits.
Furthermore, does a signed short int have to be able to
hold a value between:

A) -32,767 -> 32,768

B) -32,768 -> 32,767
Neither.

I've also heard that some systems are stupid enough (opps!
I mean poorly enough designed) to have two values for zero,
resulting in:

-32,767 -> 32,767

That's the minimum range that a signed short int must support.
 
R

Rolf Magnus

JKop said:
I've just realized something:

char >= 8 bits
short int >= 16 bits
int >= 16 bits
long int >= 32 bits
Yes.

And:

short int >= int >= long int

Uhm, no. But I guess it's just a typo :)
On WinXP, it's like so:

char : 8 bits
short : 16 bits
int : 32 bits
long : 32 bits


Anyway,

Since there's a minimum, why haven't they just be given definite
values! like:

char : 8 bits
short : 16 bits
int : 32 bits
long : 64 bits

Because there are other platforms for which other sizes may fit better.
There are even systems that only support data types with a multple of
24bit as size. C++ can still be implemented on those, because the size
requirements in the standard don't have fixed values. Also, int is
supposed (though not required) to be the machine's native type that is
the fastest one. On 64 bit platforms, it often isn't though.
or maybe even names like:

int8
int16
int32
int64

C99 has something like this in the header <stdint.h>. It further defines
smallest and fastest integers with a specific minimum size, like:

int_fast16_t
int_least32_t

This is a good thing, because an exact size is only needed rarely. Most
often, you don't care for the exact size as long as it's the fastest
resp. smallest type that provides at least a certain range.
And so then if you want a greater amount of distinct possible values,
there'll be standard library classes. For instance, if you want a
128-Bit integer, then you're looking for a data type that can store
3e+38 approx. distinct values. Well... if a 64-Bit integer can store
1e+19 approx values, then put two together and viola, you've got a
128-Bit number:

class int128
{
private:

int64 a;
int64 b;
//and so on
};

Or while I'm thinking about that, why not be able to specify whatever
size you want, as in:

int8 number_of_sisters;

int16 population_my_town;

int32 population_of_asia;

int64 population_of_earth;

or maybe even:

int40 population_of_earth;


Some people may find that this is a bit ludicrious, but you can do it
already yourself with classes: if you want a 16,384 bit number, then
all you need to do is:

class int16384
{
private:
unsigned char data[2048];

//and so on

};

Or maybe even be able to specify how many distinct possible
combinations you need. So for unicode:

unsigned int<65536> username[15];


This all seems so simple in my head - why can't it just be as so!

It isn't as simple as you might think. If it were, you could just start
writing a proof-of-concept implementation. :)
 
N

Nemanja Trifunovic

On MS windows, some compilers (eg. gcc) have 32-bit wchar_t and
some (eg. Borland, Microsoft) have 16-bit. On all other systems
that I've encountered, it is 32-bit.

This is quite annoying (for people whose language falls in the
over-65535 region especially). One can only hope that MS will
eventually come to their senses, or perhaps that someone will
standardise a system of locales.

It is hardly annoying for these people, because they died long before
computers were invented. Non-BMP region contains mostly symbols for
dead languages.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top