Convert string to size_t?

R

Rui Maciel

What's the preferred way to extract a size_t value from a string? A quick google
search only returned suggestions such as atoi() and strtol(), which casts only to
signed integral types, and relying on istringstream, which doesn't sound good
particularly in my case as I'm already getting the number string from a lexer.

So, any suggestions?


Thanks in advance,
Rui Maciel
 
K

Kaz Kylheku

What's the preferred way to extract a size_t value from a string? A quick google
search only returned suggestions such as atoi() and strtol(), which casts only to

atoi: avoid it; it has no error checking! atoi is only for throwaway
programs. (However, if your lexer can typographically validate the
number to be in a given range, it can work, but is still hacky).

It's true that strtol has a signed type, but your quick
google overlooked strtoul.

These strto* functions have error checking. Out of range inputs
are reported using a combination of sentinel return values and errno.

The thing about errno is that standard library functions do
errno. They can only be relied upon to set errno.

Before calling strtoul, set errno to 0. Then if the return value
is ULONG_MAX, check errno: if it contains ERANGE, the input
is too large for the type.

Note: I know this is comp.lang.c++, but from a C point of view, you
should be aware that C99 has a long long type. The point is that a
size_t type nowadays may be wider than unsigned long. In C99, a printed
representation of a size_t may need to be scanned with strtoull (string
to unsigned long long).
 
M

Michael Tsang

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
istringstream is the preferred way.

Please don't use attachments.

#include <sstream>

size_t func(const string &s) {
istringstream f(s);
size_t ans;
f >> ans;
// validation omitted
return ans;
}
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAksWcioACgkQG6NzcAXitM890QCdGPfruLILtTJsrpUZhZ0vlR1j
Vk4AoI4UQh2xzn36nbasG7XRspBn1XrS
=VVo5
-----END PGP SIGNATURE-----
 
M

Michael Doubez

atoi: avoid it; it has no error checking! atoi is only for throwaway
programs. (However, if your lexer can typographically validate the
number to be in a given range, it can work, but is still hacky).

It's true that strtol has a signed type, but your quick
google overlooked strtoul.

These strto* functions have error checking. Out of range inputs
are reported using a combination of sentinel return values and errno.

The thing about errno is that standard library functions do
errno. They can only be relied upon to set errno.

Before calling strtoul, set errno to 0. Then if the return value
is ULONG_MAX, check errno: if it contains ERANGE, the input
is too large for the type.

Note: I know this is comp.lang.c++, but from a C point of view, you
should be aware that C99 has a long long type. The point is that a
size_t type nowadays may be wider than unsigned long.  In C99, a printed
representation of a size_t may need to be scanned with strtoull (string
to unsigned long long).

You have forgotten the end: size_t's integral type is not known.

You must compare the result from strtoul(l) with
std::numeric_limits<size_t>::max() to determine if there is an
overflow.
 
M

Michael Tsang

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I am not using any attachments. Please stop complaining to others about
your own broken newsreader, which does not understand what
"Content-Disposition: inline" means.


What I said.


Please stop throwing illegible gibberish in front of everyone's face,
instead of using a correct MIME content type that's usable by all
MIME-compliant messaging software that can handle it properly, according
to their security capabilities.

I'm using KNode 4.3.2 and I've signed this article with PGP.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAksXzbsACgkQG6NzcAXitM+1PACghsoRZoyBHdw5sJfbhznpewrF
QzIAn20NlUPSHcUryFwvFC7+imp/cjcL
=YFvN
-----END PGP SIGNATURE-----
 
M

Michael Tsang

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


C++ has long long type in the 2009 standard.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAksXze0ACgkQG6NzcAXitM+1MgCfcSoQQb/ppgKZGVh/jx9iFoiH
8NkAn0Z1t1B3yYDVc7D6ldi33mkYqjD3
=xYps
-----END PGP SIGNATURE-----
 
J

James Kanze

[...]
I am not using any attachments. Please stop complaining to
others about your own broken newsreader, which does not
understand what "Content-Disposition: inline" means.

You are, and if the injection site enforced the rules, you're
posts wouldn't get in.

In this case, however, it's really a case of the pot calling the
kettle black, because all the PGP signature junk is even more
offensive. (Technically, I don't think it's forbidden, because
in the end, it's just text. But it certainly violates the
spirit of the rules, and it's very irritating.)

Of course, given the number of layers any posting goes through,
it's not always possible to be 100% conformant, but I suspect
that both of you could easily turn the offensive bits off, if
you didn't want to be intentionally antisocial.
 
I

Ian Collins

James said:
Of course, given the number of layers any posting goes through,
it's not always possible to be 100% conformant, but I suspect
that both of you could easily turn the offensive bits off, if
you didn't want to be intentionally antisocial.

Like broken sigs?

:)
 
D

Default User

Ian said:
Like broken sigs?

Not technically his fault, as Google Groups break legal separators. Of
course, knowing this, he could refrain from using one.



Brian
 
M

Michael Tsang

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

James said:
C++ doesn't have a 2009 standard.

So what is C++0x? Isn't it supposed to be the 2009 standard?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAksY4w0ACgkQG6NzcAXitM+UbwCfc8mnHFK/BlCSH4LtkB27mpJr
WoQAn1rsKMs/5po6tz6rmekt0+tK2Ckd
=ij9p
-----END PGP SIGNATURE-----
 
M

Michael Doubez

So what is C++0x? Isn't it supposed to be the 2009 standard?

No. Perhaps it will be C++0a or C++0b which make 2010 or 2011.
And it says no such thing only that size_t is defined in cstddef
header and that this header is the same as the C library header.

I don't think the C standard imposes a type for size_t; it can be any
numeric type (which is a pain with printf()-like format function).
 
J

James Kanze

So what is C++0x? Isn't it supposed to be the 2009 standard?
[/QUOTE]
No. Perhaps it will be C++0a or C++0b which make 2010 or 2011.

The way things are going... It's already too late for 2010 (it
takes at least six months after a successful CD), and 2011 seems
seriously compromized as well.
And it says no such thing only that size_t is defined in
cstddef header and that this header is the same as the C
library header.

More precisely: C++03 says that size_t is defined as in C90, and
C++0x will say that it is defined as in C99.

But the issue wasn't size_t, it was long long. C++0x will have
long long, and all of the other extended types of C99.
I don't think the C standard imposes a type for size_t; it can
be any numeric type (which is a pain with printf()-like format
function).

It must be an unsigned integral type. In C90, this meant one of
char (maybe), unsigned char, unsigned short, unsigned int or
unsigned long: casting to unsigned long, and outputting with
%lu, was guaranteed to work. In C99, there's a special
formatter for it (although I forget what).
 
K

Kaz Kylheku

So what is C++0x? Isn't it supposed to be the 2009 standard?

I suspect the idea there is that 0x is the prefix for some hex number.

C++0x perhaps means: ``when we get this straightened out, we will convert
the year to a hex number and stick it on''.

So for instance, if they hammer it into shape in the year 2011, it will be:

C++0x7DB

:)
 
M

Michael Doubez

It must be an unsigned integral type.

At least on one architecture I worked on, it was a signed integral (I
don't remember which architecture); I guess it was not conformant.

From what I have found, it could be a matter of backward compatibility
with sys/types.h in Linux which defined it as a signed integer.
 In C90, this meant one of
char (maybe), unsigned char, unsigned short, unsigned int or
unsigned long:

Googling for it, C99 introduced a minimum size for size_t: at least 16
bits.
casting to unsigned long, and outputting with %lu, was guaranteed to work..

Well you cannot have bigger than unsigned long. It seems C99 also
recommends it should converts to unsigned long.

Does that make size_t 32 bits ? Or does that make unsigned long 64
bits ?
 
J

James Kanze

James Kanze wrote:
I'm a bit surprised by this. unsigned long is typically 32
bits; on 64-bit systems size_t tends to be 64 bit. A cast
would _not_ work!

The current C++ standard guarantees that unsigned long be the
largest integral type available, as did C90. If an
implementation makes size_t 64 bits, but unsigned long 32, it's
not C++ (but it will be C++ when the next version of the
standard appears). In practice, most implementations for 64 bit
systems make long 64 bits, as this is the only reasonable choice
for such systems. (Arguably, int should also be 64 bits on such
systems, but there are various reasons why this is not usually
the case.)

Note that in C90 and C++, the guarantee that long was the
largest integral type was an important one---more important in
C90, of course. And there was very extensive discussion in the
C committee about allowing larger types, with a great deal of
opposition precisely because it broke things like
``printf("%ul",(unsigned long)a_size_t)'' (which had been
guaranteed), and broke them in the worst way. On the other
hand, it was realized that there would probably be a need in the
future for even larger types (uint256_t, etc.), and even on 32
bit machines, there was a need for a 64 bit integral type, so
the committee bit the bullet. Still, from a quality of
implementation point of view, I wouldn't expect an
implementation to make size_t larger than unsigned long unless
there were two or more types larger than int.
 
J

James Kanze

At least on one architecture I worked on, it was a signed
integral (I don't remember which architecture); I guess it was
not conformant.

Not according to C90 (or Posix).
From what I have found, it could be a matter of backward
compatibility with sys/types.h in Linux which defined it as a
signed integer.

On a Unix like, I find this very surprising. From memory, Unix
has always required it to be an unsigned type.
Googling for it, C99 introduced a minimum size for size_t: at
least 16 bits.

Seems reasonable. I don't think a machine which didn't allow
objects (including arrays) larger than 256 bytes would have many
takers.
Well you cannot have bigger than unsigned long. It seems C99
also recommends it should converts to unsigned long.

I don't have my C99 standard on hand to verify, but simple
common sense (or respect for your users) would seem to require
avoiding breaking working code (which counted on the previous
guarantee) if possible. The one exception I could see is if you
have several built-in integral types larger than int, say 32 bit
ints, 64 bit longs and 128 bit long long, and you really do
support objects larger than 2^64 bytes.
Does that make size_t 32 bits ? Or does that make unsigned
long 64 bits ?

Size_t strictly limits the size of the largest possible object
(including arrays). Since one of the reasons for moving an
application to 64 bits is to have larger arrays, I don't think
making size_t 32 bits is an option in such cases.
 
M

Michael Doubez

I don't think the C standard imposes a type for size_t; it can
be any numeric type (which is a pain with printf()-like format
function).
It must be an unsigned integral type.
[snip]
From what I have found, it could be a matter of backward
compatibility with sys/types.h in Linux which defined it as a
signed integer.

On a Unix like, I find this very surprising.  From memory, Unix
has always required it to be an unsigned type.

IIRC it was with SunOS.

From GNU pages:
http://www.gnu.org/s/libc/manual/html_node/Important-Data-Types.html
<quote>
Unix systems did define size_t, in sys/types.h, but the definition was
usually a signed type.
</quote>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top