ispunct()

I

Ioannis Vranos

ispunct() returns true for all symbols? (like <>/@^&#@ etc).







Ioannis Vranos
 
L

Lew Pitcher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ioannis Vranos wrote:
| ispunct() returns true for all symbols? (like <>/@^&#@ etc).

Caveat: You cross-posted this question to newsgroups that cover two different
computer languages. You may get two different answers, depending on which
language is described.

The ISO/IEC 9989:1999 draft for the ISO C'99 standard says of ispunct()
"The ispunct function tests for any printing character that is one of a
locale-specific set of punctuation characters for which neither isspace nor
isalnum is true. In the "C" locale, ispunct returns true for every printing
character for which neither isspace nor isalnum is true."

So, to answer your question, for ISO C'99, in the "C" locale, all symbols will
return true from ispunct, as they
a) are printing characters,
b) do not return true from isspace, and
c) do not return true from isalnum

Other locales may result in different values from C'99 ispunct for those characters.

Other levels of C standards compliance (i.e. C'90, K&R C, etc.) may result in
different values from ispunct for those characters.

Other languages may result in different values from ispunct for those characters.



- --
Lew Pitcher

Master Codewright & JOAT-in-training | GPG public key available on request
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFAk7VBagVFX4UWr64RAqhVAJ0XGG36295evkof2QbC+zorBLtn1ACeJU2V
mfzwEoMgbv9UgMlnXJyjhb8=
=15YI
-----END PGP SIGNATURE-----
 
R

Régis Troadec

Ioannis Vranos said:
ispunct() returns true for all symbols? (like <>/@^&#@ etc).

I would say yes.
It returns true for every printable character for which neither isspace()
nor isalnum() returns true. That's what it is said in the standard.
I think about punctuators when I see ispunct(), but I don't know if its name
is semantically related to them. The short program below shows which
printable characters make ispunct() returning true and those which make
ispunct() returning false :

#include <stdio.h>
#include <ctype.h>

int main(void)
{
/* Walk through the range of printable characters
form 0x20 ' ' to Ox7E '~' in the 7-bit ASCII table */
char c = 0x20;
while(c <= 0x7E)
{
printf("Is %c a printable char different from space or alphanum?"
" %s\n",c,ispunct(c) ? "YES":"NO");
c += 1;
}
return 0;
}

Regis
 
I

Ioannis Vranos

Lew Pitcher said:
-----BEGIN PGP SIGNED MESSAGE-----

Caveat: You cross-posted this question to newsgroups that cover two different
computer languages. You may get two different answers, depending on which
language is described.


Yes i know, however i guessed that C99 ispunct() behaviour does not differ
from C++98 (and C90).






Ioannis Vranos
 
A

August Derleth

ispunct() returns true for all symbols? (like <>/@^&#@ etc).

From my manpage that shipped with gcc, ispunct() returns true for any
nonblank character that isn't a letter or a number. gcc says this
subroutine is conformant with ANSI-C.

What, exactly, is considered a letter can vary by locale, but in the C
locale any member of [A-Za-z] is considered alphabetic.
 
B

Barry Schwarz

From my manpage that shipped with gcc, ispunct() returns true for any
nonblank character that isn't a letter or a number. gcc says this
subroutine is conformant with ANSI-C.

There are a minimum of 256 possible values for a char. Blank is only
1. If we stick to the English alphabet, there are 52 letters and ten
digits leaving at least 193 values for which you man page says ispunct
returns true. Unfortunately, the C99 standard says it must be a
printing character which eliminates a significant number of these 193.
I see three possibilities:

You misquoted the man page.

The man page is less specific than it should be and therefore
misleading.

The man page is incorrect regarding compliance and therefore
misleading.
What, exactly, is considered a letter can vary by locale, but in the C
locale any member of [A-Za-z] is considered alphabetic.

In any locale, a letter is any character for which isalpha returns
true. While your regular expression is correct (because it does not
depend on representation), it may lead someone to believe that if 'A'
<= mychar <= 'Z' then mychar is a letter. On my system, there are
characters between 'I' and "J' and between 'R' and 'S' that are not
letters.



<<Remove the del for email>>
 
I

Ioannis Vranos

Barry Schwarz said:
There are a minimum of 256 possible values for a char.


We must note here that (plain) char may be either of type signed char or
unsigned char, and if it is signed char the negative values are useless
here.

Blank is only
1. If we stick to the English alphabet, there are 52 letters and ten
digits leaving at least 193 values for which you man page says ispunct
returns true. Unfortunately, the C99 standard says it must be a
printing character which eliminates a significant number of these 193.


But it is ok with me since i want to use the (printable) keyboard symbols of
the ASCII table and filter the rest letters and digits.






Ioannis Vranos
 
K

Keith Thompson

Ioannis Vranos said:
We must note here that (plain) char may be either of type signed char or
unsigned char, and if it is signed char the negative values are useless
here.

A quibble: (plain) char has the same characteristics as either signed
char or unsigned char, but it's a distinct type.
 
R

Richard Bos

Ioannis Vranos said:
Yes i know, however i guessed that C99 ispunct() behaviour does not differ
from C++98 (and C90).

Then why cross-post in the first place?

Richard
 
R

Richard Bos

Ioannis Vranos said:
We must note here that (plain) char may be either of type signed char or
unsigned char, and if it is signed char the negative values are useless
here.

True as such, but all is*()s take an int having the value of an unsigned
char (or EOF), not a signed or plain char.

Richard
 
M

Michiel Salters

Barry Schwarz said:
There are a minimum of 256 possible values for a char. Blank is only
1.

\t isn't blank ?
If we stick to the English alphabet, there are 52 letters and ten
digits leaving at least 193 values for which you man page says ispunct
returns true. Unfortunately, the C99 standard says it must be a
printing character which eliminates a significant number of these 193.

Al least \0 must be eliminated, obviously. That can never be a printing
character. I don't understand the "Unfortunately" - do you want to
imply that ispunct('\0') should be true?
I see three possibilities:

You misquoted the man page.

The man page is less specific than it should be and therefore
misleading.

The man page is incorrect regarding compliance and therefore
misleading.

I think it's the second, but it's really nit picking. The only word
missing is non-printing, and that may even be dropped in the quote.
What, exactly, is considered a letter can vary by locale, but in the C
locale any member of [A-Za-z] is considered alphabetic.

In any locale, a letter is any character for which isalpha returns
true. While your regular expression is correct (because it does not
depend on representation), it may lead someone to believe that if 'A'
<= mychar <= 'Z' then mychar is a letter. On my system, there are
characters between 'I' and "J' and between 'R' and 'S' that are not
letters.

What someone believes, based on a misinterpretaion of a regex can't be
helped. The regex is well defined and doesn't include those other
characters you refer to. Anyway, regex'es aren't C, not yet C++, and
were used only as a shorthand.
 
K

Keith Thompson

Barry Schwarz said:
[...]
What, exactly, is considered a letter can vary by locale, but in the C
locale any member of [A-Za-z] is considered alphabetic.

In any locale, a letter is any character for which isalpha returns
true. While your regular expression is correct (because it does not
depend on representation), it may lead someone to believe that if 'A'
<= mychar <= 'Z' then mychar is a letter. On my system, there are
characters between 'I' and "J' and between 'R' and 'S' that are not
letters.

What someone believes, based on a misinterpretaion of a regex can't be
helped. The regex is well defined and doesn't include those other
characters you refer to. Anyway, regex'es aren't C, not yet C++, and
were used only as a shorthand.

<OT>
I understand the intent of the shorthand, but according to my
(limited) understanding of how regular expression are defined, the
regexp [A-Za-z] covers all characters from 'A' to 'Z' and from 'a' to
'z' inclusive in the current (locale-dependent) collating sequence.
If that collating sequence happens to put non-letters between letters
(as it might on an EBCDIC system), the regexp could match non-letters.
That's why things like [:alpha:], [:lower:], and [:upper:] were
introduced.
</OT>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top