Overflow of size_t?

P

Paul N

I have a program which handles lines of text. My question relates to
where I have two strings (ie arrays of char or wchar_t) and want to
malloc a new array to hold the concatenated string.

My first thought was that you would need to check whether adding the
sizes was going to overflow a size_t. Otherwise you might allocate a
new array which was too small.

My second thought was that this was unnecessary. If a size_t can span
the available memory, and you already have both strings in memory,
then their combined length must fit in a size_t. So no problem.

But on third thoughts - does a size_t have to span the available
memory? My understanding is that it has to be big enough to hold the
size of any object. But there's no reason why you should be able to
allocate a single object filling the entire available memory. I
wondered if it were possible to have a system in which, say, there ere
several Megs of memory but it could only be allocated in segments of
64K (I think the 286 used to work like this) and so a size_t need only
be 16 bits?

So do I need to worry about this or not? And is it the same in C and
in C++? Plus, any further comments (other than general disparagement
of malloc or of handling strings by hand) welcome!

Thanks.
Paul.
 
A

Alf P. Steinbach

* Paul N:
I have a program which handles lines of text. My question relates to
where I have two strings (ie arrays of char or wchar_t) and want to
malloc a new array to hold the concatenated string.

Why not

string const s1 = "blah blah";
string const s2 = "blah blah aha";
string const all = s1 + s2; // That's it! In C++.

My first thought was that you would need to check whether adding the
sizes was going to overflow a size_t. Otherwise you might allocate a
new array which was too small.

You shouldn't have strings that large unless you're on a severely
memory-challenged system like a 16-bit embedded processor.


My second thought was that this was unnecessary. If a size_t can span
the available memory, and you already have both strings in memory,
then their combined length must fit in a size_t. So no problem.

But on third thoughts - does a size_t have to span the available
memory? My understanding is that it has to be big enough to hold the
size of any object. But there's no reason why you should be able to
allocate a single object filling the entire available memory. I
wondered if it were possible to have a system in which, say, there ere
several Megs of memory but it could only be allocated in segments of
64K (I think the 286 used to work like this) and so a size_t need only
be 16 bits?

Yes, not only could be but reportedly (reported in this group) has actually been
the case for one C++ compiler. However, not relevant any more.

So do I need to worry about this or not?

Not on a modern system, no.

And is it the same in C and in C++?

No. In C++ you have standard library classes like std::string to deal with
allocation. Let them. :)

Plus, any further comments (other than general disparagement
of malloc or of handling strings by hand) welcome!

Using a signed size type can avoid a lot of silly-code workarounds.


Cheers & hth.,

- Alf
 
E

Eric Sosman

Paul said:
I have a program which handles lines of text. My question relates to
where I have two strings (ie arrays of char or wchar_t) and want to
malloc a new array to hold the concatenated string.

My first thought was that you would need to check whether adding the
sizes was going to overflow a size_t. Otherwise you might allocate a
new array which was too small.

My second thought was that this was unnecessary. If a size_t can span
the available memory, and you already have both strings in memory,
then their combined length must fit in a size_t. So no problem.

But on third thoughts - does a size_t have to span the available
memory? My understanding is that it has to be big enough to hold the
size of any object. But there's no reason why you should be able to
allocate a single object filling the entire available memory. I
wondered if it were possible to have a system in which, say, there ere
several Megs of memory but it could only be allocated in segments of
64K (I think the 286 used to work like this) and so a size_t need only
be 16 bits?

So do I need to worry about this or not? And is it the same in C and
in C++? Plus, any further comments (other than general disparagement
of malloc or of handling strings by hand) welcome!

In C (I don't know C++), size_t suffices to count the
bytes of the largest possible object, and perhaps higher.
It is not guaranteed that an object could be large enough
to occupy all of memory, so size_t is not guaranteed to be
able to count all the bytes in memory. You could, perhaps,
have two or three or N largest-size objects in memory at the
same time, and the sum of their sizes might be too large for
size_t to express. In theory, anyhow.

Should you worry? That's really your decision, not mine.
If you're writing for "mainstream" systems with "flat" memory,
probably not. If you're writing for embedded applications or
for "exotic" machines, perhaps a few sleepless nights would
be called for.
 
J

Jonathan Lee

So do I need to worry about this or not?

I had a similar consideration for a big integer library
and basically decided that the check costs next to
nothing. So why not do it?

In practical terms, though, if
stringone.size() + stringtwo.size() > max_value_of_size_t
then one or both of the strings must have a size greater
than or equal to max_value_of_size_t/2. In practice, I
think new[] would throw trying to allocate memory for
a string that size. In other words, the situation you
describe is probably impossible.

--Jonathan
 
P

Pascal J. Bourguignon

Alf P. Steinbach said:
* Paul N:

Why not

string const s1 = "blah blah";
string const s2 = "blah blah aha";
string const all = s1 + s2; // That's it! In C++.



You shouldn't have strings that large unless you're on a severely
memory-challenged system like a 16-bit embedded processor.

So you're saying that it is possible to have such strings.

Yes, not only could be but reportedly (reported in this group) has
actually been the case for one C++ compiler. However, not relevant any
more.

So you mean, that yes, it's possible.

Not on a modern system, no.

But there exist systems where it would be true.


No. In C++ you have standard library classes like std::string to deal
with allocation. Let them. :)


Using a signed size type can avoid a lot of silly-code workarounds.

Roll over may occur with signed types too. Remember that C and C++
cannot do usual arithmetic, but rely on the underlying machine, and
most processors implement modulo arithmetic. Therefore C and C++
implement modulo arithmetic on most processors. (Actually, if I
understand correctly the standards, they MUST implement modulo
arithmetic on any computer, but we could have one with a "word size"
big enough that it wouldn't matter, but it would be quite a strange
computer...).
 
B

Barry Schwarz

I have a program which handles lines of text. My question relates to
where I have two strings (ie arrays of char or wchar_t) and want to
malloc a new array to hold the concatenated string.

Make up your mind which language you intend to use. C and C++ are
separate languages with just enough similarity to confuse people. Once
you decide, please post only to the group relevant to that language.
 
J

James Kanze

I have a program which handles lines of text. My question
relates to where I have two strings (ie arrays of char or
wchar_t) and want to malloc a new array to hold the
concatenated string.
My first thought was that you would need to check whether
adding the sizes was going to overflow a size_t. Otherwise you
might allocate a new array which was too small.
My second thought was that this was unnecessary. If a size_t
can span the available memory, and you already have both
strings in memory, then their combined length must fit in a
size_t. So no problem.
But on third thoughts - does a size_t have to span the
available memory?

Of course not.
My understanding is that it has to be big enough to hold the
size of any object. But there's no reason why you should be
able to allocate a single object filling the entire available
memory. I wondered if it were possible to have a system in
which, say, there ere several Megs of memory but it could only
be allocated in segments of 64K (I think the 286 used to work
like this) and so a size_t need only be 16 bits?

The 16 bit Intels operated in this mode, and there's really no
reason to suppose that there aren't processors around today that
still do. If you're portability is limited to desktop machines,
you probably won't have any problems, but if you have to
consider smaller systems, I don't know. (It's been some years
since I've worked on embedded systems.)
So do I need to worry about this or not? And is it the same in
C and in C++?

It's the same in C and in C++. It's likely more relevant in C,
since from what I hear, a lot of the smaller systems only have C
compilers, not C++.
 
J

James Kanze

I had a similar consideration for a big integer library
and basically decided that the check costs next to
nothing. So why not do it?
In practical terms, though, if stringone.size() +
stringtwo.size() > max_value_of_size_t then one or both of the
strings must have a size greater than or equal to
max_value_of_size_t/2. In practice, I think new[] would throw
trying to allocate memory for a string that size. In other
words, the situation you describe is probably impossible.

Not necessarily. I've handled strings of 40KB and more on Intel
16 bit processors. And I couldn't have concatenated them.
 
J

Jonathan Lee

Not necessarily.  I've handled strings of 40KB and more on Intel
16 bit processors.  And I couldn't have concatenated them.

No, not necessarily. Just probable. I mean, if the OP were
working on a system like that, would he be asking his question?

--Jonathan
 
B

Bart van Ingen Schenau

Paul said:
I have a program which handles lines of text. My question relates to
where I have two strings (ie arrays of char or wchar_t) and want to
malloc a new array to hold the concatenated string.
So do I need to worry about this or not? And is it the same in C and
in C++? Plus, any further comments (other than general disparagement
of malloc or of handling strings by hand) welcome!

If you expect you have to concatenate lines that are larger than 32k (or
SIZE_MAX/2), then you should worry. Not just that the calculation can
overflow (which can easily be detected), but also what you want to do
instead as you will NOT be able to allocate an object large enough to
store the concatenated string.

And if you have a fall-back strategy for those very extreme cases, you
might consider using that fall-back earlier, so that the strings never
grow to the critical size of SIZE_MAX/2.
Thanks.
Paul.

Bart v Ingen Schenau
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,480
Members
44,900
Latest member
Nell636132

Latest Threads

Top