64 bit C++ and OS defined types

I

Ian Collins

Bart said:
And what if the local coding rules state that "The name of a function
must start with an action verb", which is not an uncommon rule.
What verb do you propose to use instead of Get?

Then the issue becomes one of design. If the natural verb for an action
performed by a method is "get", that is indicative of a design smell.
If it were "calculate" then the method does something useful.
 
I

Ian Collins

Bart said:
How would changing 'unsigned' to 'UINT' help you in the example below?
Most likely, 'UINT' is a typedef for 'unsigned int' anyway.

Why use a non-portable type when we have size_t?

Keep the platform specific details of your size type out of the user
code and in the platform specific headers.

If range (and to keep Alf happy, range checking) is important, give the
container its own range type.

class MySpecialContainer
{
public:
typedef <your choice> size_type;

Then use this type for all size related returns.

This could be expanded to something like:

class MySpecialContainer
{
public:

#if DEBUG_MODE
typedef SomeBoundedType size_type;
#else
typedef size_t size_type;
#endif
 
C

Christopher

I can't go and quote the volume of replies. So I'm just going to
mention things as I go.

It seems Alf is claiming there are inherant problems with indexing
using an unsigned type. However, I am unclear on what those specific
problems are. I am not arguing it, but I haven't seen anything
mentioned that point them out. I did see mention of comparing signed
and unsigned, however I fail to see how that is relevant? I mean I'd
just cast the unsigned to a signed type before for the comparision
right? Alf assumes they are obvious. They are not obvious to me, but
then Alf has more experience than I.

I also read someone questioning my statement that the Windows APIs are
going to expect the size back as a UINT, which they made 32 bits.
Well, not directly. They ask for number of elements as a parameter to
a number of functions. Naturally the number of elements is going to
come from my std container since that is what I am using to store
elements. So, inevitablly (cant spell) I will have to take my 64 bit
value representing the number of elements in my stl container and
change it to a 32 bit value somewhere along the line. I can move the
problem wherever I wish, but the problem is still there.

If I use size_t then, that is fine until I have to convert it to a 32
bit value in order to pass it to an API call. See above.

I also saw, use assert, which sounds good to me. I could assert and
then cast. My only worry is that assert only shows up in debug right?
Perhaps, I should just write a utility method that casts a 64 bit
value to a 32 bit value, which checks the bounds, and throws an
exception when the numbers don't fit.

i am sure it will be a rare case, if any case at all. However, I want
my code to be very good. I plan to use it for demonstration while job
hunting. I don't want anyone to look at it and think, "look at this
guy, clearly ignored these cases, not very thorough"
 
J

Juha Nieminen

Jorgen said:
Yes. If we cannot talk about such things here, where should we do it?
I am here to become a better C++ programmer and having more fun while
doing it, by hearing what other people do, what they don't do, what
they like and so on.

You won't become any better of a C++ programmer if you discuss whether
you should use the word "get" in getter method names or not any more
than you will if you discuss eg. whether 2 or 4 spaces of indentation is
better or whether you should use camel-case in variable names.

Those types of discussion just aren't useful nor relevant. They are
completely a matter of taste, and your program will not become any
better or worse depending on it (as long as you use a clear style and
you use it consistently).

Things which are not really part of the C++ standard but which *can*
concretely improve your C++ programming are things like "you should
usually avoid allocating things with 'new' if an STL container suffices
equally well for the same task" and such. There are certain programming
practices which will help you write better C++ programs. However,
whether or not you should use "get" in a getter is not such a practice.
 
A

Alf P. Steinbach

* Juha Nieminen:
You won't become any better of a C++ programmer if you discuss whether
you should use the word "get" in getter method names or not any more
than you will if you discuss eg. whether 2 or 4 spaces of indentation is
better or whether you should use camel-case in variable names.

Those types of discussion just aren't useful nor relevant. They are
completely a matter of taste, and your program will not become any
better or worse depending on it (as long as you use a clear style and
you use it consistently).

Things which are not really part of the C++ standard but which *can*
concretely improve your C++ programming are things like "you should
usually avoid allocating things with 'new' if an STL container suffices
equally well for the same task" and such. There are certain programming
practices which will help you write better C++ programs. However,
whether or not you should use "get" in a getter is not such a practice.

It seems you choose to have a very limited view of software engineering.

The idea of referential transparency for getters was and is one fundamental idea
of the Eiffel language. In Eiffel you can freely (by design of the language)
change the representation of a data member from member variable to accessor
function and back, without affecting client code. Now think how easy or not that
is if you have to keep renaming the thing all the time to comply with a silly
requirement to have an utterly redundant prefix or suffix on one form, and think
of whether a main influence of a language like Eiffel can be irrelevant.

Such prefixes that indicate type or implementation aspect are the fundamental
idea of Hungarian notation. It will increase your and others' efficiency if you
stay away from Hungarian notation (which you probably already do). Think about
whether that is irrelevant to C++ programming. Then, but think about it first!,
apply that insight not only to Hungarian notation but also to other
manifestations of same the in modern C++ counter-productive idea, like, for
example, "Get" prefixes. Then, but think about it first!, think also about why
the situation is different in e.g. Java, i.e., why this ties in specifically to
C++ programming.


Cheers & hth.,

- Alf (unfortunately, can only show rough map of terrain, you have to walk it)
 
J

James Kanze

I thought this newsgroup was about the C++ language, not about
programming guidelines.

Programming guideslines for C++ are relevant. The problem with
Alf's comments concerning Get is not that he's necessarily
wrong; it's more or less an open issue, and he has a right to
his opinion. (Also, I agree with him to a point, so he can't be
all wrong.) The problem is that when reading his posts, he
seems to be putting the use of "get" in the name on the same
level as, say, dereferencing a null pointer. The difference
between "I find using `get' to be a bad idea", "It's generally
accepted that names in all caps should be reserved for macros",
and "This isn't legal C++" isn't coming accross.
Suggesting to avoid using "get" in a getter method is as
superfluous here as suggesting eg. that one should use reverse
polish notation or how many spaces should be used for
indentation. It's a matter of style, not a matter of whether
it's a standard C++ feature.

Which doesn't mean it can't be discussed. It does mean,
however, that he should make it clear that 1) it's not a problem
on the same level as e.g. dereferencing a null pointer, or even
using all caps for a variable name, and 2) in this case, it
happens to be his particular opinion---not all experts agree.
How do you suggest a program being able to use the entire
address space of the target architecture with a signed
indexing type? Even if you can address the entire address
space with a signed type (as the CPU will probably internally
use the value as an unsigned type), you still get problems
when comparing indices with < when the indices go over the
half-way mark.
Or are you going to say "nobody will ever need more than half
of the address space of any computer architecture"?

No. He's saying that most programs won't need one single byte
array that is larger than have the address space. And that
using "unsigned" to gain address space is a loosing proposition;
if you need between 0.5 and 1 times the address space today (for
a single object), you'll need more than 1 times tomorry, so you
might as well prepare for it.
 
G

gw7rib

Alf - I didn't follow some of your argument - do you think you could
expand on it a bit, perhaps in more moderate language?

* Christopher:
(snip)

And even though the standard library has size methods with unsigned result type,
also that is an abomination. It's just sh*tty design, very much so, to use a C++
built-in type,

size_t seems the ideal type for returning a value which indicates how
many of something you've got, as this is more or less what it was
designed for. (Or do you not count size_t as "built-in"?)

Why is it so dreadful for the standard library to return built-in
types, or are you not actually objecting to this per se?
where there's no guaranteed overflow checking,

Unsigned types don't check for overflow, but they do produce defined
results when they do overflow.

Signed tyes generally don't check for overflow either, and it is
undefined behaviour if they do overflow. Why is this an improvement?
to indicate some
limited value range (technically: (1) it has no advantages, and (2) it has lots
of problems, including very serious ones,

Agreed. Try googling for "{23,34,12,17,204,99,16};" I take it this is
the sort of thing you had in mind?
so (3) it's just sh*tty). Instead use
a signed type such as 'int' or 'long' or, if you absolutely must, 'ptrdiff_t'.

Why do you want a signed type to indicate a quantity, which can't be
negative? Aren't you wasting half its potential values?
 
J

James Kanze

Juha said:
Alf P. Steinbach wrote: [...]
Suggesting to avoid using "get" in a getter method is as
superfluous here as suggesting eg. that one should use
reverse polish notation or how many spaces should be used
for indentation. It's a matter of style, not a matter of
whether it's a standard C++ feature.
It's actually relevant, because naming conventions when
programming in C++ might be specific to the language, for
example to fit in with the naming conventions used in the C++
standard library. In this case for example, the library uses
things like size() to get the size, resize() to change it,
thus one might want to avoid calling something get_size() in
C++.

This would be a stronger argument if the library were even
half-way consistent. Following Alf's reasoning elsethread
(which I basically agree with, by the way, although I think he's
overstating the issue), the argument is fundamentally between:
int size() const ;
void size( int newSize ) ;
and
int getSize() const ;
void setSize( int newSize ) ;
I definitely prefer the former, but...

The code will work just as well, and in fact be just as readable
with the latter. And if all of the existing code uses the
latter, or if the majority of your collegues prefer the latter,
it's better to be consistent, rather than have some use one, and
some another. (And even if I think they're wrong to prefer the
latter, there are a lot more important things to convince them.)
On most machines, an unsigned size_t is only necessary for
indexing arrays of char. In other cases, ptrdiff_t can hold
all valid indicies. One should prefer a signed type in most
cases, since arithmetic using it behaves more normally.

Exactly. In this case, there is a reasonably strong technical
argument for using int as an index, instead of unsigned. On the
other hand, there are even stronger technical reasons for not
mixing signedness, and since the standard library got it wrong,
you're often stuck with size_t, when you shouldn't be.
(Logically, of course: an index is the difference between two
addresses. And the difference between two addresses is a
ptrdiff_t, not a size_t.)
 
G

Goran

2. in practice, an underflow with unsigned on raw arrays and some
This is pretty unclear, but unsigned opens the door for more bugs, so this
argument about probability of detecting those bugs is pretty lame. :)

Why? It's a classic application of "fail fast" at work: going into an
array with -x __happens__. E.g. bad decrement somewhere gives you -1,
or, bad difference gives (typically small!) -x. Now, that typically
ends in reading/writing bad memory, which is with small negatives
detected quickly only if you're lucky. If, however, that decrement/
subtraction is done unsigned, you typically explode immediately,
because there's a very big chance that memory close to 0xFFFF... ain't
yours.
The problems with unsigned types are well known.

Your compiler, if it's any good, will warn you about comparisions
unsigned/signed. Those warnings are serious. Where you have such type mismatch
(which results from unsigned) you often have a bug.

True, but why are signed and unsigned mixed in the first place? I say,
because of the poor design! IOW, in a poor design, it's bad. So how
about clearing that up first?
Your compiler cannot, however, warn you about arithmetic problems.

True, but they exist for signed types, too. Only additional problem
with unsigned is that subtraction is more tricky (must know that a>b
before doing a-b). But then, I question the frequency at which e.g.
sizes are subtracted. And even then (get this!), it's fine. Result is
__signed__ and it all works. (Hey, look! Basic math at work: subtract
two natural numbers and you don't get a natural number!) Well, it
works unless you actually work on an array of bytes, but that example
is contrived and irrelevant, I mighty agree with you there.

I also question the relevance of signed for subtraction of indices,
because going into an array with a-b where a<b is just as much of a
bug as with unsigned. So with signed, there has to be a check (if (a-
b>=0)), with unsigned, there has to be a check (if (a>b)). So I see no
gain with signed, only different forms.
There's a host of bug vectors in that, including the main example of loop
counting down (incorrectly expressed).

Hmmm... But I see only one vector: can't decrement before checking for
0.

So the two dangers above can take many forms, but honestly, how
difficult is it for someone to grasp the concept? I say, not very.

You claim that these potential bugs are important. I claim that they
are not, because I see very little subtraction of indices in code I
work with, and very little backwards-going loops. That may be
different for you, but I'll still wager that these are overall in low
percentiles.

You also conveniently chose to overlook (or worse yet, call it hand-
waiving) the true nature of a count and an index (they are natural
numbers). I can't see how designing closer to reality can be
pointless.

And so I have to tell you what somebody already told you here: you
seem to adhere to "anything that makes your point weaker is "grossly
irrelevant". Anything that supports your point is, however, relevant."

Goran.
 
J

James Kanze

* Juha Nieminen:

[...]
It is bad style in C++ precisely because C++ doesn't have any
language feature to make use of it (Java does have such a
feature).

I'm not sure what you mean by "feature" here. As I see it,
there are two alternatives:
int size() const ;
void size( int newSize ) ;
and
int getSize() const ;
void setSize( int newSize ) ;
Both work, and both result in readable code. I prefer the
first, but the difference isn't enormous.

And both work equally well in Java as in C++. (And in
beginning, the Java library was sometimes inconsistent in its
choice as well.)
That's a misunderstanding, sort of urban legend, unfortunately
still bandied about as if it were meaningful.
It isn't meaningful.
For in order to make use of the extra range of unsigned you
need a /character/ (byte) array larger than one half of the
address space.

A single byte array. You generally can't have two of them.

Unless your on a system which has a segmented address
architecture (e.g. an Intel processor), under a system which
actually uses it (not Windows or Linux). On a 16 bit Intel,
under MS-DOS, it did occasionally happen that people needed byte
arrays of e.g. 50000 bytes. And you could have several of them
(in different segments), even though a single object couldn't be
more than 2^16 bytes in size. The same thing would be true
today, on a 32 bit Intel, if you had a decent OS for it (instead
of Windows or Linux), although in a very real sense, there's a
much greater difference between 2^15 and 2^16 than between 2^31
and 2^32.
For 32-bit programs you don't even have that address space
available in Windows. For 64-bit programs, when was the last
time you needed a 2^63 bytes character array? And can you cite
any common computer with that much memory? I guess there might
be a Cray... Then, you're talking about /always/ using
unsigned indexing in order to support the case of using a Cray
to address a larger than 2^63 character array. Hello.

Historically, C (and early C++) ran on 16 bit machines. Some of
which could effectively address more than 2^16 bytes, just not
in the same object. Historically---today, I don't think that
it's really relevant.

Even historically, however: if p and q are pointers into the
same array, and p-q doesn't give you the number of elements
between the two (and isn't negative if p points to an element
after q), then a lot of things (at least in C and C++) break.
Using an unsigned type as an index is a serious design flaw in C
and C++. About the only thing worse is mixing signed and
unsigned types in the same role and/or expression.
Thus, when reduced to a concrete issue rather than hand
waiving, it's not meaningful at all, just technically
bullshit. :)
It's best forgotten!
That's a fallacy. Using signed indexing doesn't mean you can't
use that much. It means that if you need that much memory and
can't reach it via indexing, then it is necessarily for a
character array that large. The OP's code will never be used
for a character array that large. Nor will my code or yours. I
think. :)

Having argued with you up until now:).

I can think of one exception: mmap'ing a very large text file in
a relatively simple program. (But although I'd treat the file
as a single, large array, I doubt that I'd use indexes into it.)
 
A

Alf P. Steinbach

* James Kanze:
* Juha Nieminen:
[...]
Suggesting to avoid using "get" in a getter method is as
superfluous here as suggesting eg. that one should use
reverse polish notation or how many spaces should be used
for indentation. It's a matter of style, not a matter of
whether it's a standard C++ feature.
It is bad style in C++ precisely because C++ doesn't have any
language feature to make use of it (Java does have such a
feature).

I'm not sure what you mean by "feature" here.

Introspection. Which makes it possible to create tools that depend on a certain
naming convention, tools that let you treat a "component" class very generally,
including e.g. design time manipulation. With support from the class!

And the original convention for that in Java was called "Java beans".

Quoting Wikipedia on beans: "The class properties must be accessible using get,
set, and other methods (so-called accessor methods and mutator methods),
following a standard naming convention. This allows easy automated inspection
and updating of bean state within frameworks, many of which include custom
editors for various types of properties.".

As I see it,
there are two alternatives:
int size() const ;
void size( int newSize ) ;
and
int getSize() const ;
void setSize( int newSize ) ;
Both work, and both result in readable code. I prefer the
first, but the difference isn't enormous.

And both work equally well in Java as in C++. (And in
beginning, the Java library was sometimes inconsistent in its
choice as well.)

Actually you're right that I did put things to a point. I personally prefer a
mixture, with the "set" prefix. That's because of a preference for readability
and my very subjective opinion of what constitutes readability, he he. :)

And that ties in with that one practical and very C++ specific benefit of
avoiding the prefixes has only to do with "get", not with "set".

Namely, to supporting letting the client code choose to manually optimize
(awkward notation) or not (especially when the compiler does it), by doing

void getPopulationData( Container& c )
{
Container result;
...
result.swap( c );
}

Container populationData()
{
Container c;
getPopulationData( c );
return c;
}

Here client code will preferentially use "populationData", relying on RVO for
the cases where efficiency matters.

If it turns out that the compiler isn't up to the task and measurements show
that efficiency of these calls do matter a lot, then client code can fall back
to using getPopulationData, in the place or places where it affects performance.


Cheers,

- Alf
 
A

Alf P. Steinbach

* Goran:
Why? It's a classic application of "fail fast" at work: going into an
array with -x __happens__. E.g. bad decrement somewhere gives you -1,
or, bad difference gives (typically small!) -x. Now, that typically
ends in reading/writing bad memory, which is with small negatives
detected quickly only if you're lucky. If, however, that decrement/
subtraction is done unsigned, you typically explode immediately,
because there's a very big chance that memory close to 0xFFFF... ain't
yours.

Uhm, I didn't comment on that because it wasn't necessary given that the
argument was based on detecting bugs caused by signed/unsigned problems.

But consider with signed index that is negative, corresponding to large value
unsigned,

a

If (1) the C++ implementation is based on unchecked two's complement (which is
the usual), then the address computation yields the same as with unsigned index.
So, no advantage for unsigned.

If the C++ implementation isn't based on unchecked two's complement, then either
(2) you get the same as with unsigned index (no advantage for unsigned), or (3)
you get a trap on the /arithmetic/.

So in all three possible cases unsigned lacks any advantage over signed.

This, not from data -- for I haven't any experience that I can recall with
code that supplies negative index (or corresponding with unsigned) -- but from
pure logic, which is a stronger argument, I do question your statement about
unsigned "leads to an earlier crash". The logic seems to dictate that that
simply cannot be true, unless the compiler is perverse. So I'd need to see some
pretty strong evidence to accept that it isn't totally wishful thinking.


True, but why are signed and unsigned mixed in the first place? I say,
because of the poor design! IOW, in a poor design, it's bad. So how
about clearing that up first?

Yes, that's one thing that signed sizes can help with (the main other thing
being cleaning up redundant and unnaturally structured code, like removing casts).

However, as remarked else-thread, since the standard library unfortunately uses
unsigned, "can help" isn't necessarily the same as "will help".

If applied mindlessly it may exacerbate the problem instead of fix it. But then,
so it is with all things. Needs to be done with understanding. :)


True, but they exist for signed types, too. Only additional problem
with unsigned is that subtraction is more tricky (must know that a>b
before doing a-b).

Yes, that's major problem, because the 0 limit is well within the most often
occurring set of values.

As opposed to limits of signed, which are normally way outside that set.

Thus, the 0 limit of unsigned is one often encountered (problematic), while the
limits of signed are not so often encountered (much less problematic).

But then, I question the frequency at which e.g.
sizes are subtracted. And even then (get this!), it's fine. Result is
__signed__ and it all works. (Hey, look! Basic math at work: subtract
two natural numbers and you don't get a natural number!)

ITYM, "Result is __unsigned__". And yes that works as long as keeping within
unsigned. The problem is that most everything else is signed, so keeping within
unsigned is in practice a real problem, and that's where the nub is.

Well, it
works unless you actually work on an array of bytes, but that example
is contrived and irrelevant, I mighty agree with you there.

Ah. :)

I also question the relevance of signed for subtraction of indices,
because going into an array with a-b where a<b is just as much of a
bug as with unsigned. So with signed, there has to be a check (if (a-
b>=0)), with unsigned, there has to be a check (if (a>b)). So I see no
gain with signed, only different forms.

It's not so much about that particular bug. I haven't ever encountered it,
unless I did in my student days. It's much more about loops and stuff.

But regarding that bug, if for the sake of argument it's assumed to be a real
problem, then see above: it seems signed has the advantage also there... ;-)

There's a host of bug vectors in [arithmetic], including the main example of loop
counting down (incorrectly expressed).

Hmmm... But I see only one vector: can't decrement before checking for
0.

Well, above you talked about using unsigned-only arithmetic and how that works
out nicely when keeping to unsigned. And yes it does work out well using only
unsigned arithmetic. But now you're talking about /checking/ for 0, which
implies that somehow, the result will be mixed with signed -- which is often
the case, it often will be -- which defeats the earlier argument.

The loop example (well known, well-known solutions also, except that I seem to
recall that Andrew Koenig had a very elegant one that baffled me at the time,
like how could I not have thought of that, and now I can't remember it!):

for( size_t i = v.size()-1; i >= 0; --i )

This is the natural expression of the loop, so any fix -- which is easy --
adds work, both in writing it and in grokking it later for maintainance.

Another arithmetic example (I'm sorry my example generator is sort of out of
commission, so this is not a main example, just one that I remember):

for( size_t i = 0; i < v.size()*step; i += step )

Uh huh, if 'step' is signed and negative then it's promoted to unsigned in the
arithmetic expression, and then for non-zero v.size() the loop iterates at least
once.

Again, solutions are well known.

But they have to applied (and just as importantly, it has to be recognized in
each and every case that a solution needs to be applied), which is more work,
both originally and for maintainance, and makes for less correct software.

And so on.

So the two dangers above can take many forms, but honestly, how
difficult is it for someone to grasp the concept? I say, not very.

Judging from experience and discussions here, it /is/ difficult for many to
grasp the concepts of unsigned modulo 2^n arithmetic.

But that's not the primary problem.

The primary problem is the ease of introducing pitfalls and the added work. But
could one perhaps rely on humans catching mistakes and doing everything right?
Well, think about how often you catch an error by /compiling/.

You claim that these potential bugs are important. I claim that they
are not, because I see very little subtraction of indices in code I
work with, and very little backwards-going loops. That may be
different for you, but I'll still wager that these are overall in low
percentiles.

I'm sorry but the notion that all mixing of signed and unsigned happen in
indexing and count-down loops is simply wrong. Above is one counter example.
Happily modern compilers warn about some other examples such as signed/unsigned
comparisions, but e.g. Pete Becker has argued earlier in this group that trying
to achieve warning-free compilation is futile in the context of developing
portable code, and should not be a concern, and so I gather many think that.

You also conveniently chose to overlook (or worse yet, call it hand-
waiving) the true nature of a count and an index (they are natural
numbers). I can't see how designing closer to reality can be
pointless.

The correspondence you point out, but misunderstand, is /worse/ than pointless
in C++ (although not in some other languages).

In C++ the correspondence is

endpoints of basic value range [correspond to] endpoints of restricted range

The value range on the left is one of modulo 2^n arithmetic. Its endpoints are
not barriers, they are not values that shouldn't be exceeded. On the contrary,
in your arguments above you make use of the fact that exeeding those values is
well defined in C++, a feature to be exploited, "don't care" arithmetic (of
course with the catch that this implies no mixing with signed values).

The value range on the right is, on the other hand, one whose endpoints
constitute barriers.

Exceeding those barriers is an error.

So the correspondence, such as it is, is one of comparing, to the right, the
/numerical value/ of a barrier (exceeding which is an error) to, on the left,
the /numerical value/ of a wrap-around point (exceeding which is a feature to be
exploited), and disregarding the nature of the points so compared.

One can't have both, both error and feature to be exploited. So it's not
identity, it's not "closer". It's just a coincidence of numerical values, and
when you confuse the kinds of ranges they stem from you introduce bugs.

And so I have to tell you what somebody already told you here: you
seem to adhere to "anything that makes your point weaker is "grossly
irrelevant". Anything that supports your point is, however, relevant."

I'm sorry but that's just innuendo.


Cheers & hth.,

- Alf
 
A

Alf P. Steinbach

* (e-mail address removed):
Alf - I didn't follow some of your argument - do you think you could
expand on it a bit, perhaps in more moderate language?

Heh. Well it's a good thing about a non-moderated group that we can have more
colorful discussions, to say what we Really Mean, offensive or not! :) And
even sometimes we can stray into off-topic land. It broadens the scope, and I
believe diversity is good in and of its own. The moderated group provides other
benefits such as the absence of spam and pure off-topic discussion and other
"noise", as well as a higher frequency of particpation of Real Experts -- but
regarding colorful language it is unfortunately only apparently an advantage,
because, for example, it's much /harder/ to defend oneself against insinuations
hidden in deep implications and a very intelligent person's non-offensive
language so that it passes moderation, than when the response is more direct.

size_t seems the ideal type for returning a value which indicates how
many of something you've got, as this is more or less what it was
designed for. (Or do you not count size_t as "built-in"?)

Why is it so dreadful for the standard library to return built-in
types, or are you not actually objecting to this per se?

It's not dreadful to have size_t as a built-in type.

It's dreadful to have it as an unsigned type.

That's because mixing signed and unsigned in C++ leads to a lot of problems,
which is added work, which added work may not even catch all the errors.

Unsigned types don't check for overflow, but they do produce defined
results when they do overflow.

Signed tyes generally don't check for overflow either, and it is
undefined behaviour if they do overflow. Why is this an improvement?

Because you avoid many/most of the problems of mixing signed and usigned.

Agreed. Try googling for "{23,34,12,17,204,99,16};" I take it this is
the sort of thing you had in mind?

Yes. :)

Why do you want a signed type to indicate a quantity, which can't be
negative? Aren't you wasting half its potential values?

No, there's no waste except for the case of a single byte array that's more than
half the size of addressable memory, which on a modern system you simply will
not ever have. There's no waste because that extra range isn't used, and cannot
be used (except for the single now purely hypothetical case mentioned).


Cheers & hth.,

- Alf
 
I

Ian Collins

Alf said:
* (e-mail address removed):

No, there's no waste except for the case of a single byte array that's
more than half the size of addressable memory, which on a modern system
you simply will not ever have.

Why not?
 
I

Ian Collins

Alf said:
* Ian Collins:

Just try to come up with a /concrete/ example... :)

32bit Solaris?

Any 32 bit OS with memory mapped files and large file support?
 
A

Alf P. Steinbach

* Ian Collins:
32bit Solaris?

Any 32 bit OS with memory mapped files and large file support?

And is the concrete example then of mapping a 2 GB file to memory under Solaris
and using indexing instead of pointer arithmetic to access it?

Well, I grant that it's possible, and so "will not ever have" was too strong.

But we have to really search to come up with such special cases, and they're not
problematic for the convention of using signed sizes. For when you're doing a
over-half-memory file mapping you're on the edge of things and have to deal with
much more serious problems. By comparision, applying the right types for the
special case shouldn't be hard, and anyway shouldn't influence the choice of
types for more normal code: the marginal gain for the special case doesn't
outweight the serious problems elsewhere (unless Marketing gets into it :) ).


Cheers, & hth.,

- Alf


PS-
I don't have Solaris (James has) but I think the following C code, just grabbed
from the net, would do something like that:

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>

int main(void)
{
int fd, i;
char *mmap_space;
long pagesize = sysconf(_SC_PAGESIZE);
unsigned long mmap_size = 3200000000;

mmap_size = ALLOCSIZE - malloc_size;
if ((fd = open("/dev/zero", O_RDWR)) == -1)
perror("open"), exit(-1);
mmap_space = (void*)mmap((caddr_t) 0,
mmap_size,
(PROT_READ | PROT_WRITE),
MAP_PRIVATE,
fd,
(off_t)0);
if (mmap_space == MAP_FAILED)
perror("mmap"), exit(-1);
(void)close(fd);
(void)fprintf(stderr, "mmap'd %lu bytes\n", mmap_size);

/*
* Just to be thorough, test evey page
*/
(void)fprintf(stderr, "Testing the %lu mmap'd bytes ...\n", mmap_size);
for (i=0; i<mmap_size; i+=pagesize)
mmap_space = i;
(void)fprintf(stderr, "done\n");
return 0;
}

Have you tried it?
-DS
 
I

Ian Collins

Alf said:
* Ian Collins:

And is the concrete example then of mapping a 2 GB file to memory under
Solaris and using indexing instead of pointer arithmetic to access it?

Or simply calling malloc( 3*1024*1024*1024 )!

Which would be impossible on a 32 bit system with a signed size_t.
Well, I grant that it's possible, and so "will not ever have" was too
strong.

It generally is!
PS-
I don't have Solaris (James has) but I think the following C code, just
grabbed from the net, would do something like that:
Have you tried it?

My last 32 bit system went to the happy recycling ground last year....
 
A

Alf P. Steinbach

* Ian Collins:
Or simply calling malloc( 3*1024*1024*1024 )!

Have you tried that?

Not that it has anything to do with the discussion of signed sizes, but the code
I provided was from an article showing how to overcome a reportedly common 2 GB
limit in Solaris.

I guess it depends much on the version.

Which would be impossible on a 32 bit system with a signed size_t.


It generally is!

Yeah, as James Bond reportedly remarked, never say never... ;-)

My last 32 bit system went to the happy recycling ground last year....

So, for you and some others it's already not a practical proposition or even
possible at all to map a file into more than half the available address range
(not even mentioning the matter of processing it at the byte level). :)

Then if I were inclined to word-weaseling I could claim that by "modern system"
of course I meant a 64-bit one.

He he.

But really, I don't think that the argument about "wasting" some address space
holds water at all.

And as I understand it you agree with that and just playing Devil's Advocate
here (which is good).


Cheers & hth.,

- Alf
 
I

Ian Collins

Alf said:
* Ian Collins:

Have you tried that?

Not on a 32 bit system.
Not that it has anything to do with the discussion of signed sizes, but
the code I provided was from an article showing how to overcome a
reportedly common 2 GB limit in Solaris.

I guess it depends much on the version.

I probably does, older version did gave a 2GB limit.
So, for you and some others it's already not a practical proposition or
even possible at all to map a file into more than half the available
address range (not even mentioning the matter of processing it at the
byte level). :)

Well I do have some 8 and 16 bit embedded development boards I could
power up....
Then if I were inclined to word-weaseling I could claim that by "modern
system" of course I meant a 64-bit one.

If you exclude modern cell phones, engine management units, toasters.....
But really, I don't think that the argument about "wasting" some address
space holds water at all.

And as I understand it you agree with that and just playing Devil's
Advocate here (which is good).

Well it is Sunday evening :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,140
Latest member
SweetcalmCBDreview
Top