Can isspace('\0') ever be true?

A

Army1987

Does the Standard forbid isspace(0) from returning 1 in any locale?
Is it possible that in some locales the loop
for (cursor = nptr; isspace((unsigned char)*cursor); cursor++)
continue;
slips off the end of the string pointed by nptr?
Should I change the second expression to *cursor && isspace(/*etc*/)
or I'm just too paranoid?
 
M

Malcolm McLean

Army1987 said:
Does the Standard forbid isspace(0) from returning 1 in any locale?
Is it possible that in some locales the loop
for (cursor = nptr; isspace((unsigned char)*cursor); cursor++)
continue;
slips off the end of the string pointed by nptr?
Should I change the second expression to *cursor && isspace(/*etc*/)
or I'm just too paranoid?
You are safe. The end of string marker must be zero, and isspace() must
return false for it.
 
E

Eric Sosman

Malcolm said:
You are safe. The end of string marker must be zero, and isspace() must
return false for it.

Can you cite chapter and verse for the second "must?"
 
R

Robert Gamble

Can you cite chapter and verse for the second "must?"

From 9899:1999 §7.4.1.10:
"The isspace function tests for any character that is a standard white-
space character or is one of a locale-specific set of characters for
which isalnum is false."
From 9899:1999 §5.2.1:
"Each set is further divided into a basic character set, whose
contents are given by this subclause, and a set of zero or more locale-
specific members (which are not members of the basic character set)
called extended characters."
....
"A byte with all bits set to 0, called the null character, shall exist
in the basic execution character set; it
is used to terminate a character string."

So the null character is a member of the basic character set and
therefore cannot be a locale-specific character. Since isspace()
returns true only for standard white-space characters of locale-
specific characters, of which the null character is neither, it must
return false when called with the null character.

Robert Gamble
 
E

Eric Sosman

Robert said:
"The isspace function tests for any character that is a standard white-
space character or is one of a locale-specific set of characters for
which isalnum is false."

"Each set is further divided into a basic character set, whose
contents are given by this subclause, and a set of zero or more locale-
specific members (which are not members of the basic character set)
called extended characters."
...
"A byte with all bits set to 0, called the null character, shall exist
in the basic execution character set; it
is used to terminate a character string."

So the null character is a member of the basic character set and
therefore cannot be a locale-specific character. Since isspace()
returns true only for standard white-space characters of locale-
specific characters, of which the null character is neither, it must
return false when called with the null character.

... but "locale-specific set of characters" is not the
same thing as "set of locale-specific characters." In the
first case the adjectival phrase "locale-specific" applies
to "set," while in the second it applies to "characters."
"Republican delegates to the Congress" are not "delegates
to the Republican Congress."

Army1987 may have spotted a defect.
 
A

Army1987

Robert said:
Malcolm McLean wrote:

news:p[email protected]... [can isspace(0) be nonzero, in any locale?
You are safe. The end of string marker must be zero, and isspace() must
return false for it.
Can you cite chapter and verse for the second "must?"

From 9899:1999 §7.4.1.10:
"The isspace function tests for any character that is a standard white-
space character or is one of a locale-specific set of characters for
which isalnum is false."
From 9899:1999 §5.2.1:
"Each set is further divided into a basic character set, whose contents
are given by this subclause, and a set of zero or more locale- specific
members (which are not members of the basic character set) called
extended characters."
...
"A byte with all bits set to 0, called the null character, shall exist
in the basic execution character set; it is used to terminate a
character string."

So the null character is a member of the basic character set and
therefore cannot be a locale-specific character. Since isspace()
returns true only for standard white-space characters of locale-
specific characters, of which the null character is neither, it must
return false when called with the null character.

... but "locale-specific set of characters" is not the
same thing as "set of locale-specific characters." In the first case
the adjectival phrase "locale-specific" applies to "set," while in the
second it applies to "characters." "Republican delegates to the
Congress" are not "delegates to the Republican Congress."

Army1987 may have spotted a defect.
Well, I hope no such locale actually exists. If it did, given the
description of strtol():
"First, they decompose the input string into three parts: an
initial, possibly empty, sequence of white-space characters (as
specified by the isspace function), a subject sequence resembling
an integer represented in some radix determined by the value of
base, and a final string of one or more unrecognized characters,
including the terminating null character of the input string.
Then, they attempt to convert the subject sequence to an integer,
and return the result."

I would expect that in such a locale, strtol(" ", &endptr, 10)
would be likely to behave strangely, and
char p[2][4] = { " ", "100" }; strtol(p[0], &endptr, 10)
would almost surely behave strangely. I was surprised by the
Standard not (apparently) forbidding that. The reason why I asked
that is:
I am writing a replacement of strtoul() which doesn't handle
negative numbers (i.e. treats '-' as an invalid character).

unsigned long int ustrtoul(const char *nptr, char **endptr,
int base)
{
const char *cursor;
for (cursor = nptr; isspace((unsigned char)*cursor); cursor++)
continue;
if (*cursor == '-') { /* Now *cursor is the first nonWS */
if (endptr != NULL)
*endptr = (char *)nptr; /* endptr should be const char**, */
/* but is char** to mimic strtoul() */
return 0;
} else
return strtoul(nptr, endptr, base);
}
If there were a locale with isspace('\0') != 0, that would cause
UB, but possibly even the "real" strtoul() would.
 
A

Army1987

On Sun, 15 Jul 2007 10:36:10 -0400, Eric Sosman wrote:
[can isspace(0) be nonzero, in any locale?}
Army1987 may have spotted a defect.
Well, I hope no such locale actually exists. If it did, given the
description of strtol():
"First, they decompose the input string into three parts: an
initial, possibly empty, sequence of white-space characters (as
specified by the isspace function), a subject sequence resembling
an integer represented in some radix determined by the value of
base, and a final string of one or more unrecognized characters,
including the terminating null character of the input string.
Then, they attempt to convert the subject sequence to an integer,
and return the result."

I would expect that in such a locale, strtol(" ", &endptr, 10)
would be likely to behave strangely, and
char p[2][4] = { " ", "100" }; strtol(p[0], &endptr, 10)
would almost surely behave strangely. I was surprised by the
Better, since the string is { ' ', ' ', ' ', 0 }, dividing it into
three such parts, the last being nonempty, yields {' ', ' ', ' '},
{ } and { 0 }.
Standard not (apparently) forbidding that. The reason why I asked
that is:
I am writing a replacement of strtoul() which doesn't handle
negative numbers (i.e. treats '-' as an invalid character).

unsigned long int ustrtoul(const char *nptr, char **endptr,
int base)
{
const char *cursor;
for (cursor = nptr; isspace((unsigned char)*cursor); cursor++
Maybe that should be isspace(*(unsigned char *)cursor)? (Or
equivalently declaring cursor as a const unsigned char*.) There is
a difference if char is signed with sign-and-magnitude or ones'
complement.
continue;
if (*cursor == '-') { /* Now *cursor is the first nonWS */
if (endptr != NULL)
*endptr = (char *)nptr; /* endptr should be const char**, */
/* but is char** to mimic strtoul() */
return 0;
} else
return strtoul(nptr, endptr, base);
}
If there were a locale with isspace('\0') != 0, that would cause
UB, but possibly even the "real" strtoul() would.
No, it wouldn't. The first lines should be:
{
unsigned const char *cursor;
for (cursor = nptr; *cursor && isspace(*cursor); cursor++);
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top