quickest way to determine character sizes

H

hello smith

Hello,
I have an unsigned char array. I want to determine if each char's ascii
value is less than 127. Is there a faster way than looping through the
characters and checking if each is less than 127?

Thanks in advance.
 
J

Joona I Palaste

hello smith said:
Hello,
I have an unsigned char array. I want to determine if each char's ascii
value is less than 127. Is there a faster way than looping through the
characters and checking if each is less than 127?
Thanks in advance.

That's the most efficient algorithm there is, so that's as efficient
as you're going to get, unless your compiler vendor has supplied you
with a library function coded in pure machine code or something.
From an algorithm theory standpoint, you're asking "Can I see whether
each char is less than 127 without looking at each char to see if it's
less than 127?" What would you want? Clairvoyance?
 
E

Eric Sosman

hello said:
Hello,
I have an unsigned char array. I want to determine if each char's ascii
value is less than 127. Is there a faster way than looping through the
characters and checking if each is less than 127?

Probably not.

For the closely related question "less than 128" it
might be just a smidgen quicker to calculate the inclusive
OR of all the characters and then compare the result to
128 (this generalizes to any power-of-two limiting value).

However, I'll bet that the smidgen saved (if anything
is saved at all) will be too small to make any measurable
difference in the performance of the surrounding program.
Have you forgotten Jackson's Laws of Program Optimization?

First Law of Program Optimization:
Don't do it.

Second Law of Program Optimization (for experts only):
Don't do it yet.

I hope you'll pardon my saying so, but the wording of your
question suggests (in two ways) that the Second Law doesn't
apply to you at this stage in your development. Obey the
First Law until your expertise improves.
 
N

nrk

hello said:
Hello,
I have an unsigned char array. I want to determine if each char's ascii
value is less than 127. Is there a faster way than looping through the
characters and checking if each is less than 127?

Thanks in advance.

Are you looking for something like:

int checkArray(unsigned char *arr, int len) {
unsigned char temp = 0;

while ( len-- ) temp |= *arr++;
return temp < 127;
}

?

However, I doubt if this has any benefit other than making your code
difficult to understand.

-nrk.
 
E

Eric Sosman

nrk said:
Are you looking for something like:

int checkArray(unsigned char *arr, int len) {
unsigned char temp = 0;

while ( len-- ) temp |= *arr++;
return temp < 127;
}

?

However, I doubt if this has any benefit other than making your code
difficult to understand.

It also makes the code wrong: try it on an array
containing the two values 112 and 7.

Of course, this bug could be considered a "benefit"
in that it will teach the programmer (painfully) the
truth of Knuth's dictum: "Premature optimization is the
root of all evil."
 
N

nrk

Eric said:
It also makes the code wrong: try it on an array
containing the two values 112 and 7.

Of course, this bug could be considered a "benefit"
in that it will teach the programmer (painfully) the
truth of Knuth's dictum: "Premature optimization is the
root of all evil."

Whoops!! As Dan famously puts it "I should've engaged my brain before
posting" :) The eyes see 127, but the mind wants and pleads for 128.

To OP: Take Eric's advise and discard mine. Or, alternately, if you
believe experience is the best teacher, feel free to use that buggy code.

-nrk.
 
D

Dan Pop

In said:
I have an unsigned char array. I want to determine if each char's ascii
value is less than 127. Is there a faster way than looping through the
characters and checking if each is less than 127?

Yes, if the array is properly aligned and properly sized to be aliased
with an array of unsigned int (things that you can control):

if (UINT_MAX == 4294967295 && CHAR_BIT == 8 && sizeof(unsigned) == 4) {
unsigned acc = 0;
unsigned *p = (unsigned *)array, *q = p + sizeof array / sizeof *p;
while (p < q) acc |= *p++;
if ((acc & 0x80808080) == 0) allascii = 1;
else allascii = 0;
}
else {
/* check char by char */
}

One could argue that the bits inside an int could be randomly distributed,
but this is the kind of risk that you can reasonably take. However, if
you really want, you can explicitly check:

allascii = -1;
if (UINT_MAX == 4294967295 && CHAR_BIT == 8 && sizeof(unsigned) == 4) {
unsigned n = 0x80808080;
unsigned char *p = (unsigned char *)&n;
unsigned test = p[0] | p[1] | p[2] | p[3];
if (test == 0x80) {
unsigned acc = 0;
unsigned *p = (unsigned *)array, *q = p + sizeof array / sizeof *p;
while (p < q) acc |= *p++;
if ((acc & 0x80808080) == 0) allascii = 1;
else allascii = 0;
}
}
if (allascii < 0) {
/* check char by char */
}

There is another way of coding the loop, to avoid scanning the whole
array in case of an early non-ascii character:

allascii = 1;
while (p < q)
if ((*p++ & 0x80808080) != 0) {
allascii = 0;
break;
}

But now, the body of the loop may execute slower, so it's hard to say
which version to prefer, without knowing how your typical data looks
like. If most arrays contain only ascii characters, the original version
is probably better.

Dan
 
C

Christopher Benson-Manica

Joona I Palaste said:
What would you want? Clairvoyance?

I don't know, I think

int ask_the_oracle( cont char *question, ... );

if( ask_the_oracle("Do all the characters in the array that follows "
"each have an ASCII code less than 127?"
,&my_array) ) {
printf( "Yay!\n" );
}

would be quite useful.
 
K

Kenneth Brody

Joona said:
That's the most efficient algorithm there is, so that's as efficient
as you're going to get, unless your compiler vendor has supplied you
with a library function coded in pure machine code or something.
From an algorithm theory standpoint, you're asking "Can I see whether
each char is less than 127 without looking at each char to see if it's
less than 127?" What would you want? Clairvoyance?

If you knew certain things about the array, such as "it's always a
multiple of 4 bytes", you could, perhaps, speed things up by doing
something (probably platform-specific) like:

for each 4-byte entry "*pt" in array
{
if ( (*pt & 0x80808080) != 0 )
return something_is_greater_than_127;
}
return nothing_is_greater_than_127;

Unless, of course, the OP's "less than 127" is correct, and didn't
really mean "less than or equal to 127". In which case... "never
mind".

--

+---------+----------------------------------+-----------------------------+
| Kenneth | kenbrody at spamcop.net | "The opinions expressed |
| J. | http://www.hvcomputer.com | herein are not necessarily |
| Brody | http://www.fptech.com | those of fP Technologies." |
+---------+----------------------------------+-----------------------------+
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top