Reformulating a macro to use argument just once

F

Francois Grieu

Consider this macro

// check if x, assumed of type unsigned char, is in range [0x20..0x7E]
#define ISVALID(x) ((x)>=0x20 && (x)<=0x7E)

Of course, this can't be safely used as in
if (ISVALID(*p++)) foo();
where p is a pointer ot unsigned char.


Unless I err, this issue can be fixed (and often, performance
improved) using

#define ISVALID(x) ((unsigned char)((x)-0x20)<=0x7E-0x20)



Buf can we do something similar about this one?

// check if x, assumed of type unsigned char,
// is in range [0x20..0x7E] or grater than 0xC0
#define ISVALID(x) ((x)>=0x20 && (x)<=0x7E || (x)>=0xC0)



Francois Grieu
 
P

Peter Nilsson

Francois said:
Consider this macro

// check if x, assumed of type unsigned char, is in range [0x20..0x7E]
#define ISVALID(x) ((x)>=0x20 && (x)<=0x7E)

Of course, this can't be safely used as in
if (ISVALID(*p++)) foo();
where p is a pointer ot unsigned char.

So don't use it that way.

Note that ISVALID is in a class of reserved identifiers.
Unless I err, this issue can be fixed (and often, performance
improved) using

#define ISVALID(x) ((unsigned char)((x)-0x20)<=0x7E-0x20)

Why bother? The same issue exists for the ISXXXX macros in
Buf can we do something similar about this one?

// check if x, assumed of type unsigned char,
// is in range [0x20..0x7E] or grater than 0xC0
#define ISVALID(x) ((x)>=0x20 && (x)<=0x7E || (x)>=0xC0)

If you assume x is in the range 0..0xFF, you can do...

#define VALID(x) (28245449 % (((x)/32+1)*6+5) == 0)
 
F

Francois Grieu

"Peter Nilsson said:
Francois Grieu wrote: (name and comment fixed)
// check if x, assumed of type unsigned char,
// is in range [0x20..0x7E] or at least 0xC0
#define VALID(x) ((x)>=0x20 && (x)<=0x7E || (x)>=0xC0)

If you assume x is in the range 0..0xFF, you can do...

#define VALID(x) (28245449 % (((x)/32+1)*6+5) == 0)

That wont work. After (x)/32, 0x7E and 0x7F are undistinguishable.


Francois Grieu
 
K

Keith Thompson

Peter Nilsson said:
Francois said:
Consider this macro

// check if x, assumed of type unsigned char, is in range [0x20..0x7E]
#define ISVALID(x) ((x)>=0x20 && (x)<=0x7E)

Of course, this can't be safely used as in
if (ISVALID(*p++)) foo();
where p is a pointer ot unsigned char.

So don't use it that way.

Note that ISVALID is in a class of reserved identifiers.

No, it ISn't.
Why bother? The same issue exists for the ISXXXX macros in
<ctype.h>, but it generally _isn't_ a problem.

There are no ISXXXX macros in <ctype.h>. There are a number of isXXXX
functions declared in <ctype.h> (which may also be implemented as
macros with the same names).
 
B

Ben Pfaff

Keith Thompson said:
Peter Nilsson said:
Francois said:
Consider this macro

// check if x, assumed of type unsigned char, is in range [0x20..0x7E]
#define ISVALID(x) ((x)>=0x20 && (x)<=0x7E)

Of course, this can't be safely used as in
if (ISVALID(*p++)) foo();
where p is a pointer ot unsigned char.

So don't use it that way.

Note that ISVALID is in a class of reserved identifiers.

No, it ISn't.

In C89, the linker isn't required to be case-sensitive, so it's
risky to make that assertion.
 
K

Keith Thompson

Ben Pfaff said:
Keith Thompson said:
Peter Nilsson said:
Francois Grieu wrote:
Consider this macro

// check if x, assumed of type unsigned char, is in range [0x20..0x7E]
#define ISVALID(x) ((x)>=0x20 && (x)<=0x7E)

Of course, this can't be safely used as in
if (ISVALID(*p++)) foo();
where p is a pointer ot unsigned char.

So don't use it that way.

Note that ISVALID is in a class of reserved identifiers.

No, it ISn't.

In C89, the linker isn't required to be case-sensitive, so it's
risky to make that assertion.

Good point (but I wonder *how* risky it is).

I'm really glad that the E* identifiers in <errno.h.h> are macros;
otherwise any identifier starting with 'e' and a digit or letter could
be dangerous.
 
P

Peter Nilsson

Keith said:
No, it ISn't.


There are no ISXXXX macros in <ctype.h>.

Well, at least I was right about it not being a problem!

Thanks for correcting my bilge.
 
P

Peter Nilsson

Obviously I read (and wrote) hastily.

I was wrong about ISVALID being reserved, and I missed the 7E/7F
subtlety.
Apologies
// check if x, assumed of type unsigned char,
// is in range [0x20..0x7E] or at least 0xC0
#define VALID(x) ((x)>=0x20 && (x)<=0x7E || (x)>=0xC0)

If you assume x is in the range 0..0xFF, you can do...

#define VALID(x) (28245449 % (((x)/32+1)*6+5) == 0)

That wont work.

Good. It's not the sort of thing you should be doing anyway. ;-)
 
F

Francois Grieu

For the record, the thread is about reformulating the
following macro as another macro such that, like a function,
it evaluates it's argument only once.

// Check if x, assumed the be of type unsigned char,
// lies in either [0x20..0x7E] or [0xC0..UCHAR_MAX]
#define VALID(x) ((x)>=0x20 && (x)<=0x7E || (x)>=0xC0)

Using a function (including inline), or table, no matter
how practical, is disregarded hereafter. To me this is now
an intellectual and mathematical challenge, although it came
in practice (validating data according to European Regulation
3821/1985, Annex 1B)



Then retracted
It's not the sort of thing you should be doing anyway. ;-)


But that gave me an idea! Assuming UCHAR_MAX is 0xFF,
a solution optimized towards using the least characters is

#define VALID(x) ((x)+128)%160<95)

The use of % makes it a dog, performancewise, on many
architectures (embedded 8 bit). I'm looking for a more
efficient solution. Making no assumption on UCHAR_MAX
would be a nice plus.


Francois Grieu
 
A

Arthur J. O'Dwyer

For the record, the thread is about reformulating the
following macro as another macro such that, like a function,
it evaluates its argument only once.

// Check if x, assumed the be of type unsigned char,
// lies in either [0x20..0x7E] or [0xC0..UCHAR_MAX]
#define VALID(x) ((x)>=0x20 && (x)<=0x7E || (x)>=0xC0)

Using a function (including inline), or table, no matter
how practical, is disregarded hereafter. To me this is now
an intellectual and mathematical challenge, although it came
in practice (validating data according to European Regulation
3821/1985, Annex 1B)

You might have better luck in rec.puzzles or comp.programming,
since the question you're asking doesn't have anything to do with C
per se (except the coincidence that the macro is written in C).
The large number of C gurus here is probably irrelevant, because
a C guru wouldn't try to do what you're trying in the first place.

Crossposted and followups set to comp.programming and c.l.c.
But that gave me an idea! Assuming UCHAR_MAX is 0xFF,
a solution optimized towards using the least characters is

#define VALID(x) ((x)+128)%160<95)

The use of % makes it a dog, performancewise, on many
architectures (embedded 8 bit). I'm looking for a more
efficient solution. Making no assumption on UCHAR_MAX
would be a nice plus.

-Arthur
 
F

Francois Grieu

For the record, the thread is about reformulating the
following macro as another macro such that, like a function,
it evaluates its argument only once.

// Check if x, assumed the be of type unsigned char,
// lies in either [0x20..0x7E] or [0xC0..UCHAR_MAX]
#define VALID(x) ((x)>=0x20 && (x)<=0x7E || (x)>=0xC0)

Using a function (including inline), or table, no matter
how practical, is disregarded hereafter. To me this is now
an intellectual and mathematical challenge, although it came
in practice (validating data according to European Regulation
3821/1985, Annex 1B)

You might have better luck in rec.puzzles or comp.programming,
since the question you're asking doesn't have anything to do with C
per se (except the coincidence that the macro is written in C).

The subject, at least as stated, is too dependant on the definition
of C to be fit for rec.puzzles and even comp.programming

Readers in these groups are not supposed to know about UCHAR_MAX,
unsigned char, why it matters that x appears only once in the whole
expression; much less the signedness of constants in C, which played
a role in some earlier parts of the thread.

The large number of C gurus here is probably irrelevant, because
a C guru wouldn't try to do what you're trying in the first place.

The problem of efficiently testing if a variable belongs to one of
a few intervals is very common, and I guess some C gurus would do

#include <limits.h>

/* define CharIsAlpha(x) testing if char x is a letter */
#if UCHAR_MAX==255 && 'A'==65 && 'Z'==90 && 'a'==97 && 'z'==122
/* unsigned char is 8 bit and the characters set seems ASCII */
#define CharIsAlpha(x) ((unsigned char)(((unsigned char)(x)&191)-64)<26)
#else
#include <ctypes.h>
#define CharIsAlpha(x) isalpha(x)
#endif

The optimization shown has merits: speeds up things by an order of
magnitude (no function call, no branch/cache miss), compact code,
less dependancy on runtime. I can't think of a platform where it
fails and the compiler is not buggy (say in what it assumes that
the target character set is).


Francois Grieu
 
E

Eric Sosman

Ben said:
Keith Thompson said:
Peter Nilsson said:
Francois Grieu wrote:
Consider this macro

// check if x, assumed of type unsigned char, is in range [0x20..0x7E]
#define ISVALID(x) ((x)>=0x20 && (x)<=0x7E)
[...]

Note that ISVALID is in a class of reserved identifiers.
No, it ISn't.

In C89, the linker isn't required to be case-sensitive, so it's
risky to make that assertion.

... but he's using ISVALID as the identifier for a macro,
and such identifiers have no linkage.
 
H

hagman

Arthur said:
For the record, the thread is about reformulating the
following macro as another macro such that, like a function,
it evaluates its argument only once.

// Check if x, assumed the be of type unsigned char,
// lies in either [0x20..0x7E] or [0xC0..UCHAR_MAX]
#define VALID(x) ((x)>=0x20 && (x)<=0x7E || (x)>=0xC0)

Using a function (including inline), or table, no matter
how practical, is disregarded hereafter. To me this is now
an intellectual and mathematical challenge, although it came
in practice (validating data according to European Regulation
3821/1985, Annex 1B)

You might have better luck in rec.puzzles or comp.programming,
since the question you're asking doesn't have anything to do with C
per se (except the coincidence that the macro is written in C).
The large number of C gurus here is probably irrelevant, because
a C guru wouldn't try to do what you're trying in the first place.

Crossposted and followups set to comp.programming and c.l.c.
But that gave me an idea! Assuming UCHAR_MAX is 0xFF,
a solution optimized towards using the least characters is

#define VALID(x) ((x)+128)%160<95)

The use of % makes it a dog, performancewise, on many
architectures (embedded 8 bit). I'm looking for a more
efficient solution. Making no assumption on UCHAR_MAX
would be a nice plus.

-Arthur


#define VALID(x) (((( ((x)>>5) ^4) *5)&~21)!=0)

After
((x)>>5) ^4)
we have VALID <-> result not in {0,1,4}
Multiplication by 5 causes no overflow (as we already divided by 32),
hence the only multiples of 5 with all bits contained in 21 are
indeed 0,5,20.
 
F

Francois Grieu

"hagman said:
Reformulate the following macro as another macro such that,
like a function, it evaluates its argument only once.

// Check if x, assumed the be of type unsigned char,
// lies in either [0x20..0x7E] or [0xC0..UCHAR_MAX]
#define VALID(x) ((x)>=0x20 && (x)<=0x7E || (x)>=0xC0)

Using a function (including inline), or table, no matter
how practical, is disregarded hereafter. (..This) came
in practice, validating data according to European Regulation
3821/1985, Annex 1B, Appendix 4, section 4, page 108 in http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CONSLEG:1985R3821:20060501:EN:PDF

(..) Assuming UCHAR_MAX is 0xFF,
a solution optimized towards using the least characters is

#define VALID(x) ((x)+128)%160<95)

The use of % makes it a dog, performancewise, on many
architectures (embedded 8 bit). I'm looking for a more
efficient solution. Making no assumption on UCHAR_MAX
would be a nice plus.



#define VALID(x) (((( ((x)>>5) ^4) *5)&~21)!=0)

After ((x)>>5) ^4) we have VALID <-> result not in {0,1,4}
Multiplication by 5 causes no overflow (as we already divided by 32),
hence the only multiples of 5 with all bits contained in 21 are
indeed 0,5,20.

That works, except when x is 0x7F. Nice thing is that there is
no dependency on UCHAR_MAX, as noted.


Francois Grieu
 
F

Francois Grieu

I'm asking [*]
Reformulate the following macro as another macro such that,
like a function, it evaluates its argument only once.

// Check if x, assumed the be of type unsigned char,
// lies in either [0x20..0x7E] or [0xC0..UCHAR_MAX]
#define VALID(x) ((x)>=0x20 && (x)<=0x7E || (x)>=0xC0)

Using a function (including inline), or table, no matter
how practical, is disregarded hereafter.

(..) Assuming UCHAR_MAX is 0xFF,
a solution optimized towards using the least characters is

#define VALID(x) ((x)+128)%160<95)

The use of % makes it a dog, performancewise, on many
architectures (embedded 8 bit). I'm looking for a more
efficient solution. Making no assumption on UCHAR_MAX
would be a nice plus.


Another try: I can replace the explicit % operator with multiplication
and truncation, this is more efficient with no hadware support for %

I think the following would work fine on many platforms with
that have UCHAR_MAX equal to 0xFF

#include <limits.h>
#if UCHAR_MAX==0xFFu
#if USHRT_MAX==0xFFFFu
/* good on most "regular" platform */
#define VALID(x) ((unsigned short)(((unsigned char)(x)+0x81u)*0x199u)<0x9900u)
#else
/* same idea, independent of the range for unsigned short */
#define VALID(x) (((((unsigned char)(x)+0x81u)*0x199u)&0xFFFFu)<0x9900u)
#endif
#else
#error "no working solution so far"
#endif


Francois Grieu


[*] This came in practice, validating data on a platform with UCHAR_MAX=0xFF
according to European Regulation 3821/1985, Annex 1B, Appendix 4, section 4,
page 108 in this pdf:
http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CONSLEG:1985R3821:20060501:EN:PDF
 
H

hagman

Francois said:
hagman said:
Reformulate the following macro as another macro such that,
like a function, it evaluates its argument only once.

// Check if x, assumed the be of type unsigned char,
// lies in either [0x20..0x7E] or [0xC0..UCHAR_MAX]
#define VALID(x) ((x)>=0x20 && (x)<=0x7E || (x)>=0xC0)

Using a function (including inline), or table, no matter
how practical, is disregarded hereafter. (..This) came
in practice, validating data according to European Regulation
3821/1985, Annex 1B, Appendix 4, section 4, page 108 in http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CONSLEG:1985R3821:20060501:EN:PDF

(..) Assuming UCHAR_MAX is 0xFF,
a solution optimized towards using the least characters is

#define VALID(x) ((x)+128)%160<95)

The use of % makes it a dog, performancewise, on many
architectures (embedded 8 bit). I'm looking for a more
efficient solution. Making no assumption on UCHAR_MAX
would be a nice plus.



#define VALID(x) (((( ((x)>>5) ^4) *5)&~21)!=0)

After ((x)>>5) ^4) we have VALID <-> result not in {0,1,4}
Multiplication by 5 causes no overflow (as we already divided by 32),
hence the only multiples of 5 with all bits contained in 21 are
indeed 0,5,20.

That works, except when x is 0x7F. Nice thing is that there is
no dependency on UCHAR_MAX, as noted.


Oops, I misread the limit to be "<=0x7F", sorry.
I'm not sure if I can get my trick working in that case, unless by
plugging in some multplication at the beginning:

#define OLDVALID(x) (((( ((x)>>5) ^4) *5)&~21)!=0)
#define PREMUL(x) (((unsigned int) x)*0x81 +1u)>>7
#define VALID(x) OLDVALID(PREMUL(x))

Note that PREMUL(x) = x for x<=0x7E and PREMUL(x) >=x+1 >=0x80 for
x>=0x7F.
However, this may fail under certain circumstances unless unsigned int
is more than 7 bits larger than unsigned char.
E.g., if unsigned int has less than 15 bits then PREMUL(0x80)==0 =>
0x80 wrongly invalid.
And in a world with 32bit ints and sufficiently big values of
UCHAR_MAX,
we have e.g. PREMUL(0x01FC07F1)==0
 
A

ais523

Francois said:
The problem of efficiently testing if a variable belongs to one of
a few intervals is very common, and I guess some C gurus would do

#include <limits.h>

/* define CharIsAlpha(x) testing if char x is a letter */
#if UCHAR_MAX==255 && 'A'==65 && 'Z'==90 && 'a'==97 && 'z'==122
/* unsigned char is 8 bit and the characters set seems ASCII */
#define CharIsAlpha(x) ((unsigned char)(((unsigned char)(x)&191)-64)<26)
#else
#include <ctypes.h>
#define CharIsAlpha(x) isalpha(x)
#endif
I doubt a real C guru would do this; they would allow for (as the
Standard allows for) a character set that was almost but not quite the
same as ASCII. In particular, I wouldn't be surprised if there were
locales and systems where isalpha() ought to return true for characters
like 'è' (which equals (char)232 on at least one system I use) and in
which the condition UCHAR_MAX==255 && 'A'==65 && 'Z'==90 && 'a'==97 &&
'z'==122 holds. This is still incorrect if you don't care about
locales, but less likely to trip you up in practice; still, the
theoretical incorrectness is enough reason to avoid it in my opinion.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top