ispunct()

Ioannis Vranos · May 1, 2004

ispunct() returns true for all symbols? (like <>/@^&#@ etc).

Ioannis Vranos

Lew Pitcher · May 1, 2004

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ioannis Vranos wrote:
| ispunct() returns true for all symbols? (like <>/@^&#@ etc).

Caveat: You cross-posted this question to newsgroups that cover two different
computer languages. You may get two different answers, depending on which
language is described.

The ISO/IEC 9989:1999 draft for the ISO C'99 standard says of ispunct()
"The ispunct function tests for any printing character that is one of a
locale-specific set of punctuation characters for which neither isspace nor
isalnum is true. In the "C" locale, ispunct returns true for every printing
character for which neither isspace nor isalnum is true."

So, to answer your question, for ISO C'99, in the "C" locale, all symbols will
return true from ispunct, as they
a) are printing characters,
b) do not return true from isspace, and
c) do not return true from isalnum

Other locales may result in different values from C'99 ispunct for those characters.

Other levels of C standards compliance (i.e. C'90, K&R C, etc.) may result in
different values from ispunct for those characters.

Other languages may result in different values from ispunct for those characters.

- --
Lew Pitcher

Master Codewright & JOAT-in-training | GPG public key available on request
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFAk7VBagVFX4UWr64RAqhVAJ0XGG36295evkof2QbC+zorBLtn1ACeJU2V
mfzwEoMgbv9UgMlnXJyjhb8=
=15YI
-----END PGP SIGNATURE-----

Régis Troadec · May 1, 2004

Ioannis Vranos said:
ispunct() returns true for all symbols? (like <>/@^&#@ etc).

I would say yes.
It returns true for every printable character for which neither isspace()
nor isalnum() returns true. That's what it is said in the standard.
I think about punctuators when I see ispunct(), but I don't know if its name
is semantically related to them. The short program below shows which
printable characters make ispunct() returning true and those which make
ispunct() returning false :

#include <stdio.h>
#include <ctype.h>

int main(void)
{
/* Walk through the range of printable characters
form 0x20 ' ' to Ox7E '~' in the 7-bit ASCII table */
char c = 0x20;
while(c <= 0x7E)
{
printf("Is %c a printable char different from space or alphanum?"
" %s\n",c,ispunct(c) ? "YES":"NO");
c += 1;
}
return 0;
}

Regis

Ioannis Vranos · May 1, 2004

Lew Pitcher said:
-----BEGIN PGP SIGNED MESSAGE-----

Caveat: You cross-posted this question to newsgroups that cover two different
computer languages. You may get two different answers, depending on which
language is described.

Yes i know, however i guessed that C99 ispunct() behaviour does not differ
from C++98 (and C90).

Ioannis Vranos

August Derleth · May 1, 2004

ispunct() returns true for all symbols? (like <>/@^&#@ etc).

From my manpage that shipped with gcc, ispunct() returns true for any
nonblank character that isn't a letter or a number. gcc says this
subroutine is conformant with ANSI-C.

What, exactly, is considered a letter can vary by locale, but in the C
locale any member of [A-Za-z] is considered alphabetic.

Barry Schwarz · May 2, 2004

From my manpage that shipped with gcc, ispunct() returns true for any
nonblank character that isn't a letter or a number. gcc says this
subroutine is conformant with ANSI-C.

There are a minimum of 256 possible values for a char. Blank is only
1. If we stick to the English alphabet, there are 52 letters and ten
digits leaving at least 193 values for which you man page says ispunct
returns true. Unfortunately, the C99 standard says it must be a
printing character which eliminates a significant number of these 193.
I see three possibilities:

You misquoted the man page.

The man page is less specific than it should be and therefore
misleading.

The man page is incorrect regarding compliance and therefore
misleading.

What, exactly, is considered a letter can vary by locale, but in the C
locale any member of [A-Za-z] is considered alphabetic.

In any locale, a letter is any character for which isalpha returns
true. While your regular expression is correct (because it does not
depend on representation), it may lead someone to believe that if 'A'
<= mychar <= 'Z' then mychar is a letter. On my system, there are
characters between 'I' and "J' and between 'R' and 'S' that are not
letters.

<<Remove the del for email>>

Ioannis Vranos · May 2, 2004

Barry Schwarz said:
There are a minimum of 256 possible values for a char.

We must note here that (plain) char may be either of type signed char or
unsigned char, and if it is signed char the negative values are useless
here.

Blank is only
1. If we stick to the English alphabet, there are 52 letters and ten
digits leaving at least 193 values for which you man page says ispunct
returns true. Unfortunately, the C99 standard says it must be a
printing character which eliminates a significant number of these 193.

But it is ok with me since i want to use the (printable) keyboard symbols of
the ASCII table and filter the rest letters and digits.

Ioannis Vranos

Keith Thompson · May 2, 2004

Ioannis Vranos said:
We must note here that (plain) char may be either of type signed char or
unsigned char, and if it is signed char the negative values are useless
here.

A quibble: (plain) char has the same characteristics as either signed
char or unsigned char, but it's a distinct type.

Richard Bos · May 3, 2004

Ioannis Vranos said:
Yes i know, however i guessed that C99 ispunct() behaviour does not differ
from C++98 (and C90).

Then why cross-post in the first place?

Richard

Richard Bos · May 3, 2004

Ioannis Vranos said:
We must note here that (plain) char may be either of type signed char or
unsigned char, and if it is signed char the negative values are useless
here.

True as such, but all is*()s take an int having the value of an unsigned
char (or EOF), not a signed or plain char.

Richard

Michiel Salters · May 3, 2004

Barry Schwarz said:
There are a minimum of 256 possible values for a char. Blank is only
1.

\t isn't blank ?

If we stick to the English alphabet, there are 52 letters and ten
digits leaving at least 193 values for which you man page says ispunct
returns true. Unfortunately, the C99 standard says it must be a
printing character which eliminates a significant number of these 193.

Al least \0 must be eliminated, obviously. That can never be a printing
character. I don't understand the "Unfortunately" - do you want to
imply that ispunct('\0') should be true?

I see three possibilities:

You misquoted the man page.

The man page is less specific than it should be and therefore
misleading.

The man page is incorrect regarding compliance and therefore
misleading.

I think it's the second, but it's really nit picking. The only word
missing is non-printing, and that may even be dropped in the quote.

What, exactly, is considered a letter can vary by locale, but in the C
locale any member of [A-Za-z] is considered alphabetic.

Click to expand...

In any locale, a letter is any character for which isalpha returns
true. While your regular expression is correct (because it does not
depend on representation), it may lead someone to believe that if 'A'
<= mychar <= 'Z' then mychar is a letter. On my system, there are
characters between 'I' and "J' and between 'R' and 'S' that are not
letters.

What someone believes, based on a misinterpretaion of a regex can't be
helped. The regex is well defined and doesn't include those other
characters you refer to. Anyway, regex'es aren't C, not yet C++, and
were used only as a shorthand.

Keith Thompson · May 4, 2004

Barry Schwarz said:
Barry Schwarz said:

[...]

What, exactly, is considered a letter can vary by locale, but in the C
locale any member of [A-Za-z] is considered alphabetic.

Click to expand...

In any locale, a letter is any character for which isalpha returns
true. While your regular expression is correct (because it does not
depend on representation), it may lead someone to believe that if 'A'
<= mychar <= 'Z' then mychar is a letter. On my system, there are
characters between 'I' and "J' and between 'R' and 'S' that are not
letters.

Click to expand...

What someone believes, based on a misinterpretaion of a regex can't be
helped. The regex is well defined and doesn't include those other
characters you refer to. Anyway, regex'es aren't C, not yet C++, and
were used only as a shorthand.

<OT>
I understand the intent of the shorthand, but according to my
(limited) understanding of how regular expression are defined, the
regexp [A-Za-z] covers all characters from 'A' to 'Z' and from 'a' to
'z' inclusive in the current (locale-dependent) collating sequence.
If that collating sequence happens to put non-letters between letters
(as it might on an EBCDIC system), the regexp could match non-letters.
That's why things like [:alpha:], [:lower:], and [:upper:] were
introduced.
</OT>

C11 reference book	5	Jan 3, 2012
Happy Easter	4	Apr 17, 2009
Profiler for g++ programs	4	Jun 12, 2009
"ispunct()" not working on std::string	13	Jul 18, 2007
Cyrillic text from file - set utf8 in cmd, unknown characters output anyway	0	Nov 11, 2022
How to print prefix and suffix without giving a String as an argument between them	2	May 9, 2022
Can't remove punctuation from string (compile error)	6	Oct 11, 2006
C++ 0x size and complexity	18	Feb 17, 2009

ispunct()

Ioannis Vranos

Lew Pitcher

Régis Troadec

Ioannis Vranos

August Derleth

Barry Schwarz

Ioannis Vranos

Keith Thompson

Richard Bos

Richard Bos

Michiel Salters

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads