Overflow of size_t?

Discussion in 'C Programming' started by Paul N, Jul 3, 2009.

  1. Paul N

    Paul N Guest

    I have a program which handles lines of text. My question relates to
    where I have two strings (ie arrays of char or wchar_t) and want to
    malloc a new array to hold the concatenated string.

    My first thought was that you would need to check whether adding the
    sizes was going to overflow a size_t. Otherwise you might allocate a
    new array which was too small.

    My second thought was that this was unnecessary. If a size_t can span
    the available memory, and you already have both strings in memory,
    then their combined length must fit in a size_t. So no problem.

    But on third thoughts - does a size_t have to span the available
    memory? My understanding is that it has to be big enough to hold the
    size of any object. But there's no reason why you should be able to
    allocate a single object filling the entire available memory. I
    wondered if it were possible to have a system in which, say, there ere
    several Megs of memory but it could only be allocated in segments of
    64K (I think the 286 used to work like this) and so a size_t need only
    be 16 bits?

    So do I need to worry about this or not? And is it the same in C and
    in C++? Plus, any further comments (other than general disparagement
    of malloc or of handling strings by hand) welcome!

    Thanks.
    Paul.
     
    Paul N, Jul 3, 2009
    #1
    1. Advertising

  2. * Paul N:
    > I have a program which handles lines of text. My question relates to
    > where I have two strings (ie arrays of char or wchar_t) and want to
    > malloc a new array to hold the concatenated string.


    Why not

    string const s1 = "blah blah";
    string const s2 = "blah blah aha";
    string const all = s1 + s2; // That's it! In C++.


    > My first thought was that you would need to check whether adding the
    > sizes was going to overflow a size_t. Otherwise you might allocate a
    > new array which was too small.


    You shouldn't have strings that large unless you're on a severely
    memory-challenged system like a 16-bit embedded processor.



    > My second thought was that this was unnecessary. If a size_t can span
    > the available memory, and you already have both strings in memory,
    > then their combined length must fit in a size_t. So no problem.
    >
    > But on third thoughts - does a size_t have to span the available
    > memory? My understanding is that it has to be big enough to hold the
    > size of any object. But there's no reason why you should be able to
    > allocate a single object filling the entire available memory. I
    > wondered if it were possible to have a system in which, say, there ere
    > several Megs of memory but it could only be allocated in segments of
    > 64K (I think the 286 used to work like this) and so a size_t need only
    > be 16 bits?


    Yes, not only could be but reportedly (reported in this group) has actually been
    the case for one C++ compiler. However, not relevant any more.


    > So do I need to worry about this or not?


    Not on a modern system, no.


    > And is it the same in C and in C++?


    No. In C++ you have standard library classes like std::string to deal with
    allocation. Let them. :)


    > Plus, any further comments (other than general disparagement
    > of malloc or of handling strings by hand) welcome!


    Using a signed size type can avoid a lot of silly-code workarounds.


    Cheers & hth.,

    - Alf
     
    Alf P. Steinbach, Jul 3, 2009
    #2
    1. Advertising

  3. Paul N

    Eric Sosman Guest

    Paul N wrote:
    > I have a program which handles lines of text. My question relates to
    > where I have two strings (ie arrays of char or wchar_t) and want to
    > malloc a new array to hold the concatenated string.
    >
    > My first thought was that you would need to check whether adding the
    > sizes was going to overflow a size_t. Otherwise you might allocate a
    > new array which was too small.
    >
    > My second thought was that this was unnecessary. If a size_t can span
    > the available memory, and you already have both strings in memory,
    > then their combined length must fit in a size_t. So no problem.
    >
    > But on third thoughts - does a size_t have to span the available
    > memory? My understanding is that it has to be big enough to hold the
    > size of any object. But there's no reason why you should be able to
    > allocate a single object filling the entire available memory. I
    > wondered if it were possible to have a system in which, say, there ere
    > several Megs of memory but it could only be allocated in segments of
    > 64K (I think the 286 used to work like this) and so a size_t need only
    > be 16 bits?
    >
    > So do I need to worry about this or not? And is it the same in C and
    > in C++? Plus, any further comments (other than general disparagement
    > of malloc or of handling strings by hand) welcome!


    In C (I don't know C++), size_t suffices to count the
    bytes of the largest possible object, and perhaps higher.
    It is not guaranteed that an object could be large enough
    to occupy all of memory, so size_t is not guaranteed to be
    able to count all the bytes in memory. You could, perhaps,
    have two or three or N largest-size objects in memory at the
    same time, and the sum of their sizes might be too large for
    size_t to express. In theory, anyhow.

    Should you worry? That's really your decision, not mine.
    If you're writing for "mainstream" systems with "flat" memory,
    probably not. If you're writing for embedded applications or
    for "exotic" machines, perhaps a few sleepless nights would
    be called for.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Jul 3, 2009
    #3
  4. Paul N

    Jonathan Lee Guest

    On Jul 3, 10:20 am, Paul N <> wrote:
    > So do I need to worry about this or not?


    I had a similar consideration for a big integer library
    and basically decided that the check costs next to
    nothing. So why not do it?

    In practical terms, though, if
    stringone.size() + stringtwo.size() > max_value_of_size_t
    then one or both of the strings must have a size greater
    than or equal to max_value_of_size_t/2. In practice, I
    think new[] would throw trying to allocate memory for
    a string that size. In other words, the situation you
    describe is probably impossible.

    --Jonathan
     
    Jonathan Lee, Jul 3, 2009
    #4
  5. "Alf P. Steinbach" <> writes:

    > * Paul N:
    >> I have a program which handles lines of text. My question relates to
    >> where I have two strings (ie arrays of char or wchar_t) and want to
    >> malloc a new array to hold the concatenated string.

    >
    > Why not
    >
    > string const s1 = "blah blah";
    > string const s2 = "blah blah aha";
    > string const all = s1 + s2; // That's it! In C++.
    >
    >
    >> My first thought was that you would need to check whether adding the
    >> sizes was going to overflow a size_t. Otherwise you might allocate a
    >> new array which was too small.

    >
    > You shouldn't have strings that large unless you're on a severely
    > memory-challenged system like a 16-bit embedded processor.


    So you're saying that it is possible to have such strings.


    >> My second thought was that this was unnecessary. If a size_t can span
    >> the available memory, and you already have both strings in memory,
    >> then their combined length must fit in a size_t. So no problem.
    >>
    >> But on third thoughts - does a size_t have to span the available
    >> memory? My understanding is that it has to be big enough to hold the
    >> size of any object. But there's no reason why you should be able to
    >> allocate a single object filling the entire available memory. I
    >> wondered if it were possible to have a system in which, say, there ere
    >> several Megs of memory but it could only be allocated in segments of
    >> 64K (I think the 286 used to work like this) and so a size_t need only
    >> be 16 bits?

    >
    > Yes, not only could be but reportedly (reported in this group) has
    > actually been the case for one C++ compiler. However, not relevant any
    > more.


    So you mean, that yes, it's possible.


    >> So do I need to worry about this or not?

    >
    > Not on a modern system, no.


    But there exist systems where it would be true.



    >> And is it the same in C and in C++?

    >
    > No. In C++ you have standard library classes like std::string to deal
    > with allocation. Let them. :)



    >> Plus, any further comments (other than general disparagement
    >> of malloc or of handling strings by hand) welcome!

    >
    > Using a signed size type can avoid a lot of silly-code workarounds.


    Roll over may occur with signed types too. Remember that C and C++
    cannot do usual arithmetic, but rely on the underlying machine, and
    most processors implement modulo arithmetic. Therefore C and C++
    implement modulo arithmetic on most processors. (Actually, if I
    understand correctly the standards, they MUST implement modulo
    arithmetic on any computer, but we could have one with a "word size"
    big enough that it wouldn't matter, but it would be quite a strange
    computer...).

    --
    __Pascal Bourguignon__
     
    Pascal J. Bourguignon, Jul 3, 2009
    #5
  6. On Fri, 3 Jul 2009 07:20:36 -0700 (PDT), Paul N <>
    wrote:

    >I have a program which handles lines of text. My question relates to
    >where I have two strings (ie arrays of char or wchar_t) and want to
    >malloc a new array to hold the concatenated string.


    Make up your mind which language you intend to use. C and C++ are
    separate languages with just enough similarity to confuse people. Once
    you decide, please post only to the group relevant to that language.

    --
    Remove del for email
     
    Barry Schwarz, Jul 3, 2009
    #6
  7. Paul N

    James Kanze Guest

    On Jul 3, 4:20 pm, Paul N <> wrote:
    > I have a program which handles lines of text. My question
    > relates to where I have two strings (ie arrays of char or
    > wchar_t) and want to malloc a new array to hold the
    > concatenated string.


    > My first thought was that you would need to check whether
    > adding the sizes was going to overflow a size_t. Otherwise you
    > might allocate a new array which was too small.


    > My second thought was that this was unnecessary. If a size_t
    > can span the available memory, and you already have both
    > strings in memory, then their combined length must fit in a
    > size_t. So no problem.


    > But on third thoughts - does a size_t have to span the
    > available memory?


    Of course not.

    > My understanding is that it has to be big enough to hold the
    > size of any object. But there's no reason why you should be
    > able to allocate a single object filling the entire available
    > memory. I wondered if it were possible to have a system in
    > which, say, there ere several Megs of memory but it could only
    > be allocated in segments of 64K (I think the 286 used to work
    > like this) and so a size_t need only be 16 bits?


    The 16 bit Intels operated in this mode, and there's really no
    reason to suppose that there aren't processors around today that
    still do. If you're portability is limited to desktop machines,
    you probably won't have any problems, but if you have to
    consider smaller systems, I don't know. (It's been some years
    since I've worked on embedded systems.)

    > So do I need to worry about this or not? And is it the same in
    > C and in C++?


    It's the same in C and in C++. It's likely more relevant in C,
    since from what I hear, a lot of the smaller systems only have C
    compilers, not C++.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jul 3, 2009
    #7
  8. Paul N

    James Kanze Guest

    On Jul 3, 4:50 pm, Jonathan Lee <> wrote:
    > On Jul 3, 10:20 am, Paul N <> wrote:


    > > So do I need to worry about this or not?


    > I had a similar consideration for a big integer library
    > and basically decided that the check costs next to
    > nothing. So why not do it?


    > In practical terms, though, if stringone.size() +
    > stringtwo.size() > max_value_of_size_t then one or both of the
    > strings must have a size greater than or equal to
    > max_value_of_size_t/2. In practice, I think new[] would throw
    > trying to allocate memory for a string that size. In other
    > words, the situation you describe is probably impossible.


    Not necessarily. I've handled strings of 40KB and more on Intel
    16 bit processors. And I couldn't have concatenated them.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jul 3, 2009
    #8
  9. Paul N

    Jonathan Lee Guest

    On Jul 3, 2:05 pm, James Kanze <> wrote:

    > > the situation you describe is probably impossible.

    > Not necessarily.  I've handled strings of 40KB and more on Intel
    > 16 bit processors.  And I couldn't have concatenated them.


    No, not necessarily. Just probable. I mean, if the OP were
    working on a system like that, would he be asking his question?

    --Jonathan
     
    Jonathan Lee, Jul 3, 2009
    #9
  10. Paul N wrote:

    > I have a program which handles lines of text. My question relates to
    > where I have two strings (ie arrays of char or wchar_t) and want to
    > malloc a new array to hold the concatenated string.
    >

    <snip>
    > So do I need to worry about this or not? And is it the same in C and
    > in C++? Plus, any further comments (other than general disparagement
    > of malloc or of handling strings by hand) welcome!


    If you expect you have to concatenate lines that are larger than 32k (or
    SIZE_MAX/2), then you should worry. Not just that the calculation can
    overflow (which can easily be detected), but also what you want to do
    instead as you will NOT be able to allocate an object large enough to
    store the concatenated string.

    And if you have a fall-back strategy for those very extreme cases, you
    might consider using that fall-back earlier, so that the strings never
    grow to the critical size of SIZE_MAX/2.

    >
    > Thanks.
    > Paul.


    Bart v Ingen Schenau
    --
    a.c.l.l.c-c++ FAQ: http://www.comeaucomputing.com/learn/faq
    c.l.c FAQ: http://c-faq.com/
    c.l.c++ FAQ: http://www.parashift.com/c -faq-lite/
     
    Bart van Ingen Schenau, Jul 3, 2009
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Adam Warner

    Handling unsigned overflow of size_t

    Adam Warner, Jan 7, 2005, in forum: C Programming
    Replies:
    1
    Views:
    355
  2. Adam Warner

    Handling unsigned overflow of size_t

    Adam Warner, Jan 7, 2005, in forum: C Programming
    Replies:
    5
    Views:
    600
    infobahn
    Jan 10, 2005
  3. Alex Vinokur
    Replies:
    9
    Views:
    832
    James Kanze
    Oct 13, 2008
  4. Paul N

    Overflow of size_t?

    Paul N, Jul 3, 2009, in forum: C++
    Replies:
    9
    Views:
    420
    Bart van Ingen Schenau
    Jul 3, 2009
  5. Alex Vinokur
    Replies:
    1
    Views:
    611
Loading...

Share This Page