size_t problems

R

Richard Tobin

user923005 said:
I doubt that the chance a string is longer than 2GB is always
negligible.

"Always negligible" is irrelevant. Of course it's not negligible in
programs chosen to demonstrate the problem.
Consider the characters 'C', 'T', 'A', 'G' in various combinations in
a long sequence of (say) 3 billion.
That's the human genome.

The chance of a given program being one that stores the complete human
genome in a string is negligible. People with such programs can set the
option I suggested.

-- Richard
 
U

user923005

Clearly with strlen() the chance of it being an error is negligible.
And I think this is true other size_t->int assignments. For example,
int s = sizeof(whatever) is almost never a problem.

Ideally, I would suggest not generating a warning unless some option
is set for it. (There should always be a "maximally paranoid" option
to help track down obscure errors.) But that only applies to
size_t->int assignments. Other 64->32 assignments may be more likely to be
in error. At the point you generate the warning, can you still tell
that it's a size_t rather than some other 64-bit int type?

I doubt that the chance a string is longer than 2GB is always
negligible.

Consider the characters 'C', 'T', 'A', 'G' in various combinations in
a long sequence of (say) 3 billion.
That's the human genome.

The Chrysanthemum genome is much bigger.

I know of people using database systems to do genetics research. The
probability of long character sequences on those systems is not
negligible.

If the machine is capable of handling large data, right away people
will start to do it.
 
M

Malcolm McLean

Richard Tobin said:
The chance of a given program being one that stores the complete human
genome in a string is negligible. People with such programs can set the
option I suggested.
I work in that field.
Whilst generally you'd want a "rope" type-structure to handle such a long
sequence, there might well be reasons for storing the whole genome as a flat
string. Certainly if I had a 64-bit machine with enough memory installed, I
would expect to have the option, and I'd expect to be able to write the
program in regular C.
 
J

jacob navia

Richard said:
"Always negligible" is irrelevant. Of course it's not negligible in
programs chosen to demonstrate the problem.


The chance of a given program being one that stores the complete human
genome in a string is negligible. People with such programs can set the
option I suggested.

-- Richard

The program has strings of at most a few K. It is an IDE (Integrated
development environment, debugger, etc)

An int can hold string lengths of more than 2 billion... MORE than
enough for this environment. This program has been running under 32 bit
windows where all user space is at most 2GB.
 
K

Keith Thompson

jacob navia said:
2GB strings are the most you can get under the windows schema in 32 bits.

Ok. Does your compiler know that?

Assigning an arbitrary size_t value to an object of type int, if both
types are 32 bits, could potentially overflow. Your compiler
apparently doesn't issue a warning in that case. Is it because it
knows that the value returned by strlen() can't exceed INT_MAX (if so,
well done, especially since it seems to be smart enough not to make
that assumption on a 64-bit system), or is it because it doesn't issue
a warning when both types are the same size?

For example:

size_t s = func(-1);
/* Assume func() takes a size_t argument and returns it.
Assume func() is defined in another translation unit,
so the compiler can't analyze its definition. In other
words, 's' is initialized to SIZE_MAX, but the compiler
can't make any assumptions about its value. */

signed char c = s;
/* Presumably this produces a warning. */

int i = s;
/* This is a potential overflow. Does this produce
a warning? Should it? */

If your compiler warns about the initialization of 'c' but not about
the initialization of 'i', then IMHO it's being inconsistent. This
doesn't address your original question, but it's related.

[...]
There isn't any string longer than a few K in this program!
Of course is a potential bug, but it is practically impossible!

You know that, and I know that, but what matters is what the compiler
knows.

Is it conceivable that a bug in the program and/or some unexpected
input could cause it to create a string longer than 2GB?

You asked how to suppress the bogus warnings without losing any valid
warnings. To do that, your compiler, or some other tool, has to be
able to tell the difference. Telling me that none of the strings are
longer than 2GB doesn't address that concern, unless you can convey
that knowledge to the compiler.
 
J

jacob navia

Richard said:
But int s = sizeof(char *) is not broken, even though sizeof() returns
a size_t.

-- Richard

If we use size_t everywhere, it is an UNSIGNED quantity.
This means that comparisons with signed quantities will provoke
other warnings, etc etc.

int s = strlen(str) is NOT broken.
 
J

jacob navia

Malcolm said:
I work in that field.
Whilst generally you'd want a "rope" type-structure to handle such a
long sequence, there might well be reasons for storing the whole genome
as a flat string. Certainly if I had a 64-bit machine with enough memory
installed, I would expect to have the option, and I'd expect to be able
to write the program in regular C.

YES SIR!

With my new lcc-win32 YOU WILL BE ABLE TO DO IT!

But I am not speaking of that program. I am speaking about
other programs I am PORTING from 32 bit, whose strings are never
bigger than a few Kbytes at most!
 
K

Keith Thompson

Malcolm McLean said:
The campaign for 64 bit ints T-shirts obviously didn't generate enough
publicity. I still have a few left. XXL, one size fits all.

One *shirt* fits all (unless somebody other than you actually wants
one).
There are some good reasons for not making int 64 bits on a 64 bit
machine, which as a compiler-writer you will be well aware of. However
typical computers are going to have 64 bits of main address space for
a very long time to come, so it makes sense to get the language right
now, and keep it that way for the forseeable future, and not allow
decisions to be dominated by the need to maintain compatibility with
legacy 32 bit libraries.

lcc-win32 (and presumably lcc-win64, if that's what it's called) is a
Windows compiler. jacob does not have the option of changing the
Windows API, and a compiler that's incompatible with the underlying
operating system isn't going to be very useful.
 
J

jacob navia

Keith said:
lcc-win32 (and presumably lcc-win64, if that's what it's called) is a
Windows compiler. jacob does not have the option of changing the
Windows API, and a compiler that's incompatible with the underlying
operating system isn't going to be very useful.

Yes. Mr Gates decided that

sizeof(int) == sizeof(long) == 4.

Only long long is 64 bits. PLease address alll flames to him.

NOT TO ME!!!

:)
 
K

Keith Thompson

jacob navia said:
If we use size_t everywhere, it is an UNSIGNED quantity.
This means that comparisons with signed quantities will provoke
other warnings, etc etc.

Perhaps those other signed quantities should have been unsigned as
well.
int s = strlen(str) is NOT broken.

And yet the compiler you're using warns about it. Perhaps you should
take it up with the author of the compiler.

There may well be no easy way to address your problem. Re-writing all
the code as it should have been written in the first place (using
size_t to hold size_t values) may not be practical. Turning off
warnings that you know aren't necessary, while leaving other warnings
in place, requires conveying that information to the compiler; there
may not be a mechanism for doing so. Inserting hundreds of casts
could suppress the warnings, but I dislike that solution, and it's
still a substantial amount of work.

I suppose you could write a strlen wrapper that calls the real strlen,
checks whether the result exceeds INT_MAX (if you think that check is
worth doing), and then returns the result as an int. That's assuming
strlen calls are the only things triggering the warnings. And you'd
still have to make hundreds of changes in the code.

You know that the conversions aren't going to overflow, but C's type
system doesn't let you convey that knowledge to the compiler.
 
P

pete

jacob said:
If we use size_t everywhere, it is an UNSIGNED quantity.
This means that comparisons with signed quantities will provoke
other warnings, etc etc.

int s = strlen(str) is NOT broken.

Maybe the signed quantities should be unsigned?
 
M

Martin Wells

jacob navia:
The problem is, when you have in thousands of places

int s;

// ...
s = strlen(str) ;

Since strlen returns a size_t, we have a 64 bit result being
assigned to a 32 bit int.
I do not know how to get out of this problem. Maybe any of you has
a good idea? How do you solve this when porting to 64 bits?

Assuming that you've a shred of intelligence, I'm led to believe that
you suffer from "int syndrome".

"int syndrome" reminds me of old drivers, the kind of people who
always drive the canonical route somewhere. Even during rush-hour,
even at night when the streets are clear, they always take the same
route. I don't know if you'd call it stubbornness or stupidity. They
lack dynamic-ity.

These drivers remind me of the programmers who are "int" people. The
solution to your boggle is so blatantly oblivious that I'm not even
gonna mention what the solution is.

The real problem is why you feel so indoctrinated into using int,
especially places where you shouldn't be using it.

If you want advice though, I'd say use the appropriate types where
appropriate, and to edit any code that uses types wrongly.

Martin
 
C

CBFalconer

jacob said:
I am trying to compile as much code in 64 bit mode as
possible to test the 64 bit version of lcc-win.

The problem appears now that size_t is now 64 bits.

Fine. It has to be since there are objects that are more than
4GB long. The problem is, when you have in thousands of places

int s;

// ...
s = strlen(str) ;

Since strlen returns a size_t, we have a 64 bit result being
assigned to a 32 bit int.

This can be correct, and in 99.9999999999999999999999999%
of the cases the string will be smaller than 2GB...

Simply define s as a long long or (better) as a size_t.
 
C

CBFalconer

jacob said:
2GB strings are the most you can get under the windows schema in 32
bits.


Yes, "*POTENTIALLY*" I could be missing all those strings longer
than 4GB (!!!). But I do not care about those :)

That is precisely the sloppy attitude that has led to many bugs.
 
C

CBFalconer

jacob said:
.... snip ...

int s = strlen(str) is NOT broken.

Yes it is. How can you guarantee that strlen never returns a value
that exceeds the capacity of an int? However:

size_t s = strlen(str);

is NOT broken, assuming suitable #includes and definitions.
 
K

Keith Thompson

CBFalconer said:
jacob navia wrote:
... snip ...

Yes it is. How can you guarantee that strlen never returns a value
that exceeds the capacity of an int?

By never passing it a pointer to a string longer than INT_MAX
characters. This tends to be easier than, for example, guaranteeing
that 'x + y' will never overflow.

The declaration may or may not be broken, depending on what happens at
run time. The problem is that, apparently, the programmer knows it's
safe, but the compiler doesn't have enough information to prove it.

The ideal solution is to declare s as a size_t, and to make whatever
other code changes follow from that, but that's not always practical.
 
I

Ian Collins

jacob said:
If we use size_t everywhere, it is an UNSIGNED quantity.
This means that comparisons with signed quantities will provoke
other warnings, etc etc.

int s = strlen(str) is NOT broken.

Why would you want to assign an unsigned value to an int? Why do you
think it makes sense to have a negative size?
 
C

CBFalconer

Keith said:
By never passing it a pointer to a string longer than INT_MAX
characters. This tends to be easier than, for example, guaranteeing
that 'x + y' will never overflow.

The declaration may or may not be broken, depending on what happens at
run time. The problem is that, apparently, the programmer knows it's
safe, but the compiler doesn't have enough information to prove it.

The ideal solution is to declare s as a size_t, and to make whatever
other code changes follow from that, but that's not always practical.

Which I said, and you snipped. Why?
 
J

jacob navia

Ian said:
Why would you want to assign an unsigned value to an int? Why do you
think it makes sense to have a negative size?

Because that int is used in many other contexts later, for instance
comparing it with other integers.
int len = strlen(str);

for (i=0; i<len; i++) {
/// etc
}


The i<len comparison would provoke a warning if len is unsigned...
 
J

jacob navia

Martin said:
jacob navia:



Assuming that you've a shred of intelligence, I'm led to believe that
you suffer from "int syndrome".

"int syndrome" reminds me of old drivers, the kind of people who
always drive the canonical route somewhere. Even during rush-hour,
even at night when the streets are clear, they always take the same
route. I don't know if you'd call it stubbornness or stupidity. They
lack dynamic-ity.

These drivers remind me of the programmers who are "int" people. The
solution to your boggle is so blatantly oblivious that I'm not even
gonna mention what the solution is.

The real problem is why you feel so indoctrinated into using int,
especially places where you shouldn't be using it.

If you want advice though, I'd say use the appropriate types where
appropriate, and to edit any code that uses types wrongly.

Martin

Assuming that you have a shred of intelligence, you will be able
to understand this:

That int is used in many other contexts later, for instance
comparing it with other integers.
int i,len = strlen(str);

for (i=0; i<len; i++) {
/// etc
}


The i<len comparison would provoke a warning if len is unsigned...

If I make i unsigned too, then its usage within the loop will provoke
even more problems!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top