64 bit C++ and OS defined types

Alf P. Steinbach · Apr 5, 2009

* Ian Collins:

Well I do have some 8 and 16 bit embedded development boards I could
power up....

If you exclude modern cell phones, engine management units, toasters.....

Hm, EC++ is AFAIK dead, and I'm curious: are there *any* C++ compilers extant
for 16-bit addresses (not 16-bit data, but 16-bit addresses)?

Cheers,

- Alf (wondering)

James Kanze · Apr 5, 2009

* James Kanze:

* Juha Nieminen:

Click to expand...

[...]

Suggesting to avoid using "get" in a getter method is as
superfluous here as suggesting eg. that one should use
reverse polish notation or how many spaces should be used
for indentation. It's a matter of style, not a matter of
whether it's a standard C++ feature.
It is bad style in C++ precisely because C++ doesn't have
any language feature to make use of it (Java does have such
a feature).

Click to expand...

I'm not sure what you mean by "feature" here.

Click to expand...

Introspection. Which makes it possible to create tools that
depend on a certain naming convention, tools that let you
treat a "component" class very generally, including e.g.
design time manipulation. With support from the class!

And the original convention for that in Java was called "Java
beans".

Quoting Wikipedia on beans: "The class properties must be
accessible using get, set, and other methods (so-called
accessor methods and mutator methods), following a standard
naming convention. This allows easy automated inspection and
updating of bean state within frameworks, many of which
include custom editors for various types of properties.".

Yes, but the "convention" existed and was documented by Sun
before Java supported introspection. And all introspection
really requires is a convention, not any specific convention
(but it would be somewhat difficult to implement if there were
no specific prefix).

Actually you're right that I did put things to a point. I
personally prefer a mixture, with the "set" prefix.

In other words, the worst of both worlds

.

Seriously, I mentionned the two (and not more) because those are
the only two I've seen in any actual programming guidelines, or
in real code. It's probably a reasonable argument to say that
the two functions do different things, so deserve different
names. In that case, however, it's just as reasonable to insist
that the names reflect what they do, i.e. get and set. And I
find it just as reasonable (if not more) to consider that these
aren't really "functions", despite the syntax; they "expose" (in
a controlled way) a data member, and should thus have the public
name of the data member.

A third solution, of course, would be to have:
int size() const ;
IntProxy size() ;
so you could write:
int a = x.size() ;
x.size() = a ;
In many ways, this is the most elegant. But it just seems more
work than necessary (to me anyway) to implement all of those
proxies, and C++ programmers don't really expect it.

That's because of a preference for readability and my very
subjective opinion of what constitutes readability, he he.

And that ties in with that one practical and very C++ specific
benefit of avoiding the prefixes has only to do with "get",
not with "set".

Namely, to supporting letting the client code choose to
manually optimize (awkward notation) or not (especially when
the compiler does it), by doing

void getPopulationData( Container& c )
{
Container result;
...
result.swap( c );
}

Container populationData()
{
Container c;
getPopulationData( c );
return c;
}

Here client code will preferentially use "populationData",
relying on RVO for the cases where efficiency matters.

If it turns out that the compiler isn't up to the task and
measurements show that efficiency of these calls do matter a
lot, then client code can fall back to using
getPopulationData, in the place or places where it affects
performance.

How does this change anything with regards to the choice above?
If you use get/set prefixes, overload resolution will come into
play for the selection of the get function. If you use no
prefixes, then the get name is still available for use as above.

Alf P. Steinbach · Apr 5, 2009

* James Kanze:

How does this change anything with regards to the choice above?

It's readability, of /the calling code/.

Calling code that says

populationData( o );

doesn't really say anything about what it does. Is it perhaps an assertion that
'o' is population data? Is it perhaps an extraction of population data from 'o'?
What's going to happen here -- or not?

On the other hand, code that says

getPopulationData( o );

says what it does, because there are not many rôles that o can play here and
still have a reasonable programmer's-english sentence construct.

And also code that says

Container const o = populationData();

says what it does.

Of course, also with a "get" prefix there it says what it does because the
reader recognizes the prefix as a common redundant prefix. But, being redundant,
it is redundant. IMHO just visual clutter and more to read and write.

If you use get/set prefixes, overload resolution will come into
play for the selection of the get function. If you use no
prefixes, then the get name is still available for use as above.

Overload resolution is fine with respect to the goal of having the correct
implementation invoked.

It's not fine with respect to e.g. searching in an editor.

And it's not fine with respect to readability, and other human cognitive
activities such as dicussing the code -- then distinct names are bestest.

Cheers,

- Alf

James Kanze · Apr 5, 2009

Why? It's a classic application of "fail fast" at work: going
into an array with -x __happens__. E.g. bad decrement
somewhere gives you -1, or, bad difference gives (typically
small!) -x. Now, that typically ends in reading/writing bad
memory, which is with small negatives detected quickly only if
you're lucky. If, however, that decrement/ subtraction is done
unsigned, you typically explode immediately, because there's a
very big chance that memory close to 0xFFFF... ain't yours.

Sorry, but the array class will certainly catch a negative index
(provided it uses a signed type for indexes).

Conceptually, there is an argument in favor of using a cardinal,
rather than an integer, as the index type, given that the
language (and the library) forces indexes to start at 0. (My
pre-standard array classes didn't, but that's another issue.)
But C++ doesn't have a type which emulates cardinal, so we're
stuck here. The fact remains that the "natural" type for all
integral values is int---it's what you get from an integral
literal by default, for example, it's what short, char, etc.
(and there unsigned equivalents!, if they fit in an int, which
they usually do) promote to. And mixing signed and unsigned
types in arithmetic expressions is something to be avoided. So
you want to avoid an unsigned type in this context.

True, but why are signed and unsigned mixed in the first
place? I say, because of the poor design! IOW, in a poor
design, it's bad. So how about clearing that up first?

That's what we're trying to do. Since integral literals and, in
contexts where the usual arithmetic conversions apply, unsigned
char and unsigned shorts have signed type, you're pretty much
stuck.

I might add that a compiler is allowed to check for arithmetic
overflow in the case of signed arithmetic, and not in the case
of unsigned arithmetic. Realistically, I've only heard of one
that did, however, so this is more a theoretical argument than a
practical one.

True, but they exist for signed types, too. Only additional
problem with unsigned is that subtraction is more tricky (must
know that a>b before doing a-b). But then, I question the
frequency at which e.g. sizes are subtracted.

Indexes are often subtracted. And there's no point in
supporting a size larger than that you can index.

And even then (get this!), it's fine. Result is __signed__ and
it all works.

Since when? And with what compiler? The standard states
clearly that for *all* binary operators between the same type,
the results have that type.

(Hey, look! Basic math at work: subtract two natural numbers
and you don't get a natural number!)

C++ arithmetic doesn't quite conform to the rules of basic
arithmetic. To a certain degree, it can't, since basic
arithmetic deals with infinite sets---you can't get overflow.
Unsigned arithmetic in C++ explicitely follows completely
different rules. (In passing: if you do happen to port to a
machine not using 2's complement, unsigned arithmetic is likely
to be significantly slower than signed. The C++ compiler for
the Unisys 2200 even has an option to turn off conformance here,
because of the performance penalty it exacts.)

Well, it works unless you actually work on an array of bytes,
but that example is contrived and irrelevant, I mighty agree
with you there.

I also question the relevance of signed for subtraction of
indices, because going into an array with a-b where a<b is
just as much of a bug as with unsigned. So with signed, there
has to be a check (if (a- b>=0)), with unsigned, there has to
be a check (if (a>b)). So I see no gain with signed, only
different forms.

There's a fundamental problem with signed. Suppose I have an
index into an array, and a function which, given that index,
returns how many elements forward or back I shoud move. With
unsigned indexes, the function must return some sort of struct,
with a flag indicating whether the offset if positive or
negative, and the calling code needs an if. With signed
indexes, no problem---the function just returns a negative value
to go backwards.

[...]

You claim that these potential bugs are important. I claim
that they are not, because I see very little subtraction of
indices in code I work with, and very little backwards-going
loops.

So we work with different types of code.

Note that if you subtract pointers, you also get a signed value
(possibly undefined, if you allow arrays to have a size greater

than std::numeric_limits said:
That may be different for you, but I'll still wager that these
are overall in low percentiles.

You also conveniently chose to overlook (or worse yet, call it
hand- waiving) the true nature of a count and an index (they
are natural numbers). I can't see how designing closer to
reality can be pointless.

They are a subsets of the natural numbers (cardinals), and the
natural numbers are a subset of integers. C++ has a type which
sort of approximates integers; it doesn't have a type which
approximates cardinals. The special characterists of unsigned
types mean that they are best limited to raw memory (no
calculations), bit maps and such (only bitwise operations) and
cases where you need those special chacteristics (modulo
arithmetic). Generally speaking, when I see code which uses
arithmetic operators on unsigned types, and doesn't actually
need modulo arithmetic, I suppose that the author didn't really
understand unsigned in C++.

James Kanze · Apr 5, 2009

* Goran:

But consider with signed index that is negative, corresponding
to large value unsigned,

a

Click to expand...

If (1) the C++ implementation is based on unchecked two's
complement (which is the usual), then the address computation
yields the same as with unsigned index. So, no advantage for
unsigned.

If the C++ implementation isn't based on unchecked two's
complement, then either (2) you get the same as with unsigned
index (no advantage for unsigned), or (3) you get a trap on
the /arithmetic/.

Click to expand...

So in all three possible cases unsigned lacks any advantage
over signed.

Click to expand...

If the imploementation isn't based on 2's complement, unsigned
arithmetic is likely to be considerably slower than signed,
since the compiler has to generate the code to implement the
modulo behavior of unsigned correctly.

This, not from data -- for I haven't any experience that I
can recall with code that supplies negative index (or
corresponding with unsigned)

Click to expand...

But you've certainly familiar with code which uses negative
offsets to an index. Binary search, for example.

The argument was given that indexes are natural numbers. That's
not totally true, since we expect to be able to add negative
values to them. (In an unchecked 2's complement machine, of
course, we'll probably land on our feet with the correct value
anyway. But it's hardly what I would call "clean". And if for
some reason, the offset passes through a smaller unsigned type,
e.g. unsigned int, on most 32 bit machines, we are screwed.)

James Kanze · Apr 5, 2009

* (e-mail address removed):

It's not dreadful to have size_t as a built-in type.

Technically, it's not a built-in type, but a typedef to a
built-in type.

It's dreadful to have it as an unsigned type.

Yes and no. There's some reasonable argument for it being
unsigned, *but* given that it's unsigned, it's being used (in
the standard and elsewhere) in a lot of places where ptrdiff_t
would be more appropriate. Basically, anytime you have
something that could reasonably be, in some code, calculated by
a difference between pointers (or iterators), then you should be
using ptrdiff_t. About the only time size_t is appropriate is
as an argument to malloc.

That's because mixing signed and unsigned in C++ leads to a
lot of problems, which is added work, which added work may not
even catch all the errors.

Interestingly enough, part of the problem, at least, is that
unsigned has a larger range

James Kanze · Apr 5, 2009

Well I do have some 8 and 16 bit embedded development boards I
could power up....

Historically (and this does go back some), some 16 bit systems
used a segmented architecture, in which a user process could
have up to 640KB memory, but the maximum size of a single object
(or array) was 64KB, and size_t was 16 bits. In such systems,
there is an argument concerning the addressability; making
size_t signed effectively does divide the largest size of a byte
array by 2, and it isn't that unreasonable to imagine an
application which deals with byte arrays larger than 32KB, even
on such a system. Whether supporting the additional range is
worth the hassles it causes (due to mixing of signed and
unsigned types) is very debatable, but the fact that Stepanov
originally developed the STL on such a system is probably not
foreign to his choice of size_t for indexes.

Today, of course, you won't find such things other than in
embedded systems, and I'm, not sure whether such issues are
relevant in them.

Alf P. Steinbach · Apr 6, 2009

* James Kanze:

Technically, it's not a built-in type, but a typedef to a
built-in type.

Technically that depends on the definition of "built-in", in particular whether
the type is provided by standard C++ or only by the implementation.

But regarding the typedef, that's the same as I wrote, so it's just quibbling.

Below it seems you have no problems parsing a very similar sentence:

Yes and no.

Cheers & hth.,

- Alf

Jorgen Grahn · Apr 22, 2009

You won't become any better of a C++ programmer if you discuss whether
you should use the word "get" in getter method names or not any more
than you will if you discuss eg. whether 2 or 4 spaces of indentation is
better or whether you should use camel-case in variable names.

Those types of discussion just aren't useful nor relevant. They are
completely a matter of taste, and your program will not become any
better or worse depending on it (as long as you use a clear style and
you use it consistently).

I respectfully disagree. It's a good thing if I use a clear,
consistent style *which I share with other C++ programmers*. This
group is a good place to pick up such style issues. For the word
"get", I already /know/ people (not just Alf) have strong feelings
about it -- not to mention the presence of get/set methods in C++
class design in general.

The code I'm working on now uses ALLUPPERCASE for class names. That is
both clear and consistent -- but whoever invented that style obviously
lived in a cave, isolated from other C++ programmers. I don't want to
be that guy. I don't want every programming project I enter to be its
own tiny C++ subculture.

/Jorgen

Tony · Apr 29, 2009

James said:
There's a fundamental problem with signed. Suppose I have an
index into an array, and a function which, given that index,
returns how many elements forward or back I shoud move.

You mean like this, e.g., (Example A):

int64 GetRelativePositionToMoveTo(uint32 index);
void MoveRelative(int64 relative_position);

With
unsigned indexes, the function must return some sort of struct,
with a flag indicating whether the offset if positive or
negative, and the calling code needs an if.

???

(That's not a bad idea BTW: see aside note below).

With signed
indexes, no problem---the function just returns a negative value
to go backwards.

The "only" thing using a signed index gets you (design-wise, i.e.) is ...
not much (Example B):

int32 GetRelativePositionToMoveTo(int32 index);
void MoveRelative(int32 relative_position);

Now, instead of the "impedance mismatch" between the index width/range and
the movement width/range of Example A, you have an "impedance mismatch"
between the signed index argument and the common-sense notion of "index"
which is unsigned.

(Aside: A relative movement has magnitude AND direction and is therefor
fundamentally different from an index. A class representing this may not be
a bad idea indeed and then the impedance mismatches go away entirely and the
design is then clean/clear.)

Tony

Tony · Apr 29, 2009

James said:
They are a subsets of the natural numbers (cardinals), and the
natural numbers are a subset of integers. C++ has a type which
sort of approximates integers; it doesn't have a type which
approximates cardinals. The special characterists of unsigned
types mean that they are best limited to raw memory (no
calculations), bit maps and such (only bitwise operations) and
cases where you need those special chacteristics (modulo
arithmetic). Generally speaking, when I see code which uses
arithmetic operators on unsigned types, and doesn't actually
need modulo arithmetic, I suppose that the author didn't really
understand unsigned in C++.

How could the issues with unsigned be fixed in the C++ language (or in any
language for that matter)?

Tony

Tony · Apr 29, 2009

Alf said:
It's not dreadful to have size_t as a built-in type.

It's dreadful to have it as an unsigned type.

That's because mixing signed and unsigned in C++ leads to a lot of
problems, which is added work, which added work may not even catch
all the errors.

Or is it just being lazy or unknowing in the design and trying to shoehorn
abstractions into being represented by built-in types when they should
really be classes of their own? If C++ had real typedefs (instead of just
aliases), that would be acceptable in a lot of cases, but since it doesn't,
it's a questionable practice.

Tony

Tony · Apr 29, 2009

Alf said:
No, there's no waste except for the case of a single byte array
that's more than half the size of addressable memory, which on a
modern system you simply will not ever have. There's no waste because
that extra range isn't used, and cannot be used (except for the
single now purely hypothetical case mentioned).

Would you ever use a signed integer to represent a memory address?

Alf P. Steinbach · Apr 29, 2009

* Tony:

The "only" thing using a signed index gets you (design-wise, i.e.) is ...
not much (Example B):

int32 GetRelativePositionToMoveTo(int32 index);
void MoveRelative(int32 relative_position);

Now, instead of the "impedance mismatch" between the index width/range and
the movement width/range of Example A, you have an "impedance mismatch"
between the signed index argument and the common-sense notion of "index"
which is unsigned.

In other words, instead of a practical problem one has a clash with some ideology.

For me (and I guess also for James) choosing between the two is a no-brainer.

Cheers & hth.,

- Alf

Tony · Apr 29, 2009

Alf said:
* Ian Collins:

And is the concrete example then of mapping a 2 GB file to memory
under Solaris and using indexing instead of pointer arithmetic to
access it?
Well, I grant that it's possible, and so "will not ever have" was too
strong.

What if you're allocating from the top of the virtual memory space down and
doing something with those addresses? The idea of using signed integers to
avoid language idiosynchacies which results in limiting the range that can
be represented by the platform to half, seems suspect.

Alf P. Steinbach · Apr 29, 2009

* Tony:

How could the issues with unsigned be fixed in the C++ language (or in any
language for that matter)?

I don't know whether there is a general solution.

But as noted in the general debate here, spread over many threads, the cases
where unsigned are relevant for indexing are special systems where the size of a
maximum available chunk of memory is very limited, and on those systems it is
fundamentally a speed trade-off, that of not using a too large size_t.

So it seems to me that any really cross-platform solution would have to
differentiate between two kinds of systems, for some aspects. That may sound
abhorrent, but is already to a large extent the situation with C++ (it's not for
nothing that the standard differentiates between hosted and non-hosted systems).
Currently the differentiation is of the sort where on a non-hosted system you
just may have /less/ of the standard functionality available; I gather that a
solution to the unsinged indexing issue would mean somehow also having some
more, dedicated functionality available, i.e. two different sets of
functionality. But then perhaps we're really talking about two different
languages. It may be that a "one size fits all" language (pun intended

) is
not the most practical approach...

Cheers & hth.,

- Alf (speculative)

Tony · Apr 29, 2009

Alf said:
Hm, EC++ is AFAIK dead,

Or sleeping.

Tony · Apr 29, 2009

James said:
Technically, [size_t] not a built-in type, but a typedef to a
built-in type.

"Technically" because C++ typedefs are just aliases rather than "real
typedefs".

Tony · Apr 29, 2009

SaticCaster said:
The code analyzer Viva64 will simplify migration process!
http://www.viva64.com/viva64-tool/

Expensive.

James Kanze · Apr 29, 2009

How could the issues with unsigned be fixed in the C++
language (or in any language for that matter)?

Modula-2 and Pascal handled it fairly well. Subrange types. An
array is indexed by a subrange type.

(Note that my pre-standard array classes followed the Pascal
model: the client specified both a lower and an upper bound.
And the lower bound---both actually---could be negative. While
I've never found much use for a lower bound greater than zero,
there are a couple of cases where it's useful for the lower
bound to be the complement of the upper bound, with 0 indexing
directly into the middle. In fact, I found I had a number of
cases where the array was indexed by a char, with lower bound
CHAR_MIN, and upper bound CHAR_MAX.)

64 bit integers etc...	11	Aug 15, 2007
32/64 bit cc differences	110	Jan 10, 2014
64 bit design	2	Oct 15, 2010
64-bit Python for Solaris	0	May 21, 2013
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
64-bit KISS RNGs	23	Feb 28, 2009
64-bit gotcha	13	Dec 3, 2011
Creating 64 bit DLLl on 32 bit computer under MS Visual Studio 2008	2	Oct 16, 2010

64 bit C++ and OS defined types

Alf P. Steinbach

James Kanze

Alf P. Steinbach

James Kanze

James Kanze

James Kanze

James Kanze

Alf P. Steinbach

Jorgen Grahn

Tony

Tony

Tony

Tony

Alf P. Steinbach

Tony

Alf P. Steinbach

Tony

Tony

Tony

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads