Storage of char in 64 bit machine

M

Mikhail Teterin

Jack said:
Almost every single DSP in existence, for starters.

When a concept of "currency" becomes applicable to a DSP, I may accept this
example... I was, of course, talking about general-purpose computers...
I routinely work on a TI DSP these days where CHAR_BIT is 16 and
sizeof(int) is 1.

This is interesting -- why is this a char, and not, say, a short then? Is
there an 8-bit type at all?

-mi
 
M

Mikhail Teterin

Eric said:
As one can admire the skills of the buggy whip makers and the
inordinate amount of labor they would expend on a single whip, so
one can admire the "cool tricks" of the programmer-craftsmen and
their perseverance

The major difference between the relationship of a buggy whip maker and a
car vs. that of an old-time programmer and a modern computer -- the
difference that pretty much destroys your entire analogy -- is that a car
will not run any faster, when whipped.

A well-crafted program, however, _will_ run faster on a modern computer.

The vast speed and memory-amounts improvements of modern computers are
supposed to enable them to run programs better -- not to allow the
programmers to be sloppier.

I'm not pushing *you* to write as efficient programs as you did, when
you "lived in the cage". But don't stop *me* from trying to keep my
programs runnable on that ancient hardware -- even if they'll never see it.

Running on a DSP (where CHAR_BITS is 16) is not a valid argument. The only
sound one s that a compiler can do a better job. But I doubt, it can --
I'll verify and post the results...

Yours,

-mi
 
F

Frederick Gotham

Mikhail Teterin posted:
This is interesting -- why is this a char, and not, say, a short then? Is
there an 8-bit type at all?


Define "8-Bit type". Do you mean 8 object representation bits, or 8 value
representation bits?

If you mean object representation bits, then there can't be (unless there's a
type which is provided as an extension).

If CHAR_BIT is 16, then all of the three "char" types must have 16 object
representation bits.
 
M

Mikhail Teterin

Stephen said:
The implementation is likely to have a very, very clever strcmp() that
will perform at least as well as your code (possibly doing the same thing
internally, if it's known to be safe) and likely even better if the
compiler is reasonably modern due special knowledge and treatment of
common functions/idioms.

Well, here are some benchmarks comparing the use of strcmp() to compare
short character strings (4 characters).

FreeBSD6.1/i386 using `gcc version 3.4.4 [FreeBSD] 20050518'
with "-pedantic -Wall -O5 -pipe -march=pentium4 -DNDEBUG -DNODEBUG
-fomit-frame-pointer":

./bench 100000000
str: used 1119486 microseconds
int: used 406449 microseconds

FreeBSD6.1/amd64 using the same compiler as above
with "-pedantic -Wall -O5 -pipe -march=opteron -DNDEBUG -DNODEBUG
-fomit-frame-pointer":

obj.amd64/bench 100000000
str: used 1403187 microseconds
int: used 392897 microseconds

AIX5.2/powerpc using IBM's cc with cc -O3 (-O4 would not link):

./aix-bench 100000000
str: used 3240000 microseconds
int: used 630000 microseconds

Solaris8/sparc using Sun's 6u2 compiler with `-v -fast' (fast is the "macro"
option turning on all possible optimizations including Sun's own libmil):

./sun4u-bench 100000000
str: used 7020000 microseconds
int: used 1300000 microseconds

the same using 64-bit binaries (-v -fast -xarch=v9b):

str: used 7920000 microseconds
int: used 1470000 microseconds

Solaris10/opteron using Sun's `cc: Sun C 5.7 2005/01/07' compiler with
`-fast -xarch=amd64':

./sunx86-bench 100000000
str: used 962088 microseconds
int: used 319509 microseconds

Of the above, the Sun's cc/libmil is definitely has the special "knowledge
and treatment of common functions/idioms" such as strcmp(), but even there
using strcmp() was 5 times slower...

It seems, that for the limited cases like this -- when the strings are of
the same length and fit nicely into an integer type -- treating them as such
is hugely beneficial. And, contrary to authoritative assertions posted in
this thread, compiler is NOT able to detect such cases.

I'm attaching (sigh) the simple-minded C-code to this posting -- please,
poke at it and/or reproduce my results... It even allows for CHAR_BIT to be
16 :)

Thanks!

-mi
 
F

Frederick Gotham

Frederick Gotham posted:

If CHAR_BIT is 16, then all of the three "char" types must have 16
object representation bits.


Just to clarify:

It's possible to have a 16-Bit unsigned char and unsigned short, and yet have
an 8-bit signed char (which would contain 8 bits of padding).
 
M

Mikhail Teterin

Frederick said:
It's possible to have a 16-Bit unsigned char and unsigned short, and yet
have an 8-bit signed char (which would contain 8 bits of padding).

But strcmp() expects "char *", so unsigned chars are not of concern for my
example, right?

-mi
 
K

Keith Thompson

Mikhail Teterin said:
I'm attaching (sigh) the simple-minded C-code to this posting -- please,
poke at it and/or reproduce my results... It even allows for CHAR_BIT to be
16 :)

Didn't we just go over this? Is there some reason you couldn't have
posted the C code as part of your article?
 
F

Frederick Gotham

Mikhail Teterin posted:
But strcmp() expects "char *", so unsigned chars are not of concern for my
example, right?


Sorry, I haven't looked at your example -- I just jumped into the thread when
I saw mention of the technicalities of integer types... like a bee to honey!
;)
 
M

Mikhail Teterin

Keith said:
Didn't we just go over this? Is there some reason you couldn't have
posted the C code as part of your article?

We did. And I ended up convinced, that only inertia (and the desire to force
a newcomer to obey the rules of the club), are what makes this an issue in
the first place.

People with news-readers, that are not MIME-aware will just see these
textual attachments as part of the article.

MIME-aware news-readers will be able to handle them better this way...

Sorry, if it were one file, I would've inlined it, but three -- that's just
too much trouble.

-mi
 
K

Keith Thompson

Mikhail Teterin said:
But strcmp() expects "char *", so unsigned chars are not of concern for my
example, right?

I haven't looked at your example lately, but strcmp() in effect works
with unsigned chars. Its declaration makes it look like it deals
with plain chars:

int strcmp(const char *s1, const char *s2);

but:

The sign of a nonzero value returned by the comparison functions
memcmp, strcmp, and strncmp is determined by the sign of the
difference between the values of the first pair of characters
(both interpreted as unsigned char) that differ in the objects
being compared.
 
K

Keith Thompson

Mikhail Teterin said:
We did. And I ended up convinced, that only inertia (and the desire to force
a newcomer to obey the rules of the club), are what makes this an issue in
the first place.

People with news-readers, that are not MIME-aware will just see these
textual attachments as part of the article.

MIME-aware news-readers will be able to handle them better this way...

Sorry, if it were one file, I would've inlined it, but three -- that's just
too much trouble.

Then you reached the wrong conclusion.

I, for one, will ignore any attachments posted to this newsgroup. I
will consider changing this policy only if there's a general consensus
among the regulars that text-only attachments are acceptable. (I'm
not claiming that this is a policy of the newsgroup; I speak only for
myself.)
 
M

Mikhail Teterin

Keith said:
I, for one, will ignore any attachments posted to this newsgroup.

I have not spent much time on this group, but, so far, I have not seen
anything that would make me truly saddened by your decision...
I will consider changing this policy only if there's a general consensus
among the regulars that text-only attachments are acceptable.

You are confirming my impression, that the actual merits of text-only
attachments are secondary (or even fully irrelevant) to your decision, with
the annoyance over my violating the unwritten and unofficial "rules of the
club" being the primary...

-mi
 
I

Ian Collins

Keith said:
Then you reached the wrong conclusion.

I, for one, will ignore any attachments posted to this newsgroup. I
will consider changing this policy only if there's a general consensus
among the regulars that text-only attachments are acceptable. (I'm
not claiming that this is a policy of the newsgroup; I speak only for
myself.)
In this case, I found the attachments handy, just a quick 'save all' in
Mozilla, rather than several copy and pastes.

For a single file, inline is probably best, but for several attachments
can help
 
A

Al Balmer

I have not spent much time on this group, but, so far, I have not seen
anything that would make me truly saddened by your decision...


You are confirming my impression, that the actual merits of text-only
attachments are secondary (or even fully irrelevant) to your decision, with
the annoyance over my violating the unwritten and unofficial "rules of the
club" being the primary...
Actually, I think the annoyance comes from your continuing to argue
about it after being informed about the "unwritten and unofficial
rules of the club". The "club", incidentally, includes very many
usenet groups. As it says in http://www.netmeister.org/news/usenet/
after mentioning that many users still use dialup:

"Let this be just one reason why you should never ever post an
attachment into a newsgroup that does not specifically state in its
Charta that it is desired. Usually, attachments are only appropriate
in *binaries*-newsgroups."
 
W

websnarf

Mikhail said:
Lew said:
Actually, a character isn't 8 bit. I'm simplifying a bit, but a
character is guaranteed to be /at least/ 8 bits wide, and is permitted
to be as wide as necessary. For all we (or you) know, a char might be
64bits wide on your platform.

So, comparing, say, 4-char arrays (like currency codes) can NOT be done in
the following way?

typedef union {
char acCUR[4];
int32_t iCUR;
} xCUR;

int
CurEqual(xCUR *c1, xCUR *c2)
{
if (c1->iCUR == c2->iCUR)
printf("Same currency %s\n", c1->acCUR);
else
printf("%s and %s are different\n",
c1->acCUR, c2->acCUR);
}

Having to call a strcmp() in such cases seems like a bad waste to me, but I
don't see, how the compiler could possibly optimize such a code without the
trick above...

The compiler cannot do that optimization because its not correct.
strcmp() stops executing its inner loop once it reads a '\0'. I.e.,
its possible for the strcmp()'s to be equal where the int32_t's are not
equal. Also the compiler is allowed to align struct entries as they
like. So on a 64 bit big endian system, the int32_t might not
intersect with any of the 4 acCur[] characters.
 
E

Eric Sosman

Mikhail said:
[...]
A well-crafted program, however, _will_ run faster on a modern computer.

Good. Splendid. But: HOW MUCH faster?

Elsethread you have posted an attempt to quantify HOW MUCH,
and your results suggest that the particular "cool" trick you
favor will save ...

0.00000000713037 seconds per comparison.

(Frankly, I doubt that the measurement accuracy justifies the
number of "significant" digits you've reported, but let that
pass: 0.00000000713037 seconds it is. Congratulations on your
savings; don't spend it all in one place.)

To save one second, you need to make 140245176+ comparisons.

To save the -- what? hour? let's be generous and say thirty
minutes -- to recoup the thirty minutes you have already spent
on this folly, JUST TO BREAK EVEN, you need to make 252441317912
comparisons. That number is almost fifty-nine times larger than
the largest value of an `unsigned int' on many implementations;
you will have trouble even *counting* the number of comparisons
you must make before you break even.

Thesis: A program that makes 252441317912 comparisons doesn't
need to make those comparisons faster; it needs a way to avoid
all those stupid comparisons!

Mikhail, I can see your error and understand it and sympathize
with it, because in my mis-spent youth I made the same mistake. I
claim no superiority; it's quite possible (maybe even likely) that
my sins were greater than yours are, that I exercised even worse
judgement than you are exercising now. As a sort of reformed drunk
I address the AA meeting: You are on the broad highway to Hell. You
have become besotted (as I in my time was besotted) with "clever"
tricks and "subtle" devices, and (like me) you have not stopped to
count the cost. Consider: You are giving up portability, you are
giving up clarity, you are diminishing maintainability -- and for
what? For a gain of less than one second.

Bad trade, Mikail. Very bad trade. Chortle over the cleverness
of whatever gadget takes your fancy -- but don't use it. Just don't.
You will come to regret your ingenuity -- trust me on this; I deployed
above-average ingenuity, and in the long run got into above-average
trouble. Stop, sinner, while there is yet time.

Enuf. I'm outta here.
 
E

Eric Sosman

Mikhail said:
So, comparing, say, 4-char arrays (like currency codes) can NOT be done in
the following way?

typedef union {
char acCUR[4];
int32_t iCUR;
} xCUR;
[...]

Having to call a strcmp() in such cases seems like a bad waste to me, but I
don't see, how the compiler could possibly optimize such a code without the
trick above...

The compiler cannot do that optimization because its not correct.
strcmp() stops executing its inner loop once it reads a '\0'. I.e.,
its possible for the strcmp()'s to be equal where the int32_t's are not
equal. Also the compiler is allowed to align struct entries as they
like. So on a 64 bit big endian system, the int32_t might not
intersect with any of the 4 acCur[] characters.

Agree with the first part but not with the second. Look
again: it's not a struct, but a union.
 
K

Keith Thompson

Keith Thompson said:
I, for one, will ignore any attachments posted to this newsgroup. I
will consider changing this policy only if there's a general consensus
among the regulars that text-only attachments are acceptable. (I'm
not claiming that this is a policy of the newsgroup; I speak only for
myself.)

I believe I was too hasty in making this statement. I've started a
new thread to discuss this issue.
 
W

websnarf

Mikhail said:
I know. Currencies, however, are all 3-character strings (plus the
terminating '\0'). Thus they are perfectly suited to be treated as int32_t,
when convenient.

If that is true, then in fact this is a useful performance boost, but
its platform specific. Personally, I would just capture it like this:
*((int32_t *) &currency) rather than bothering with the union.
That it is not 100% portable is already rammed into me by the friendly folks
on this board. I'd like to know an example of the actual hardware/compiler
combo, where it would not work, though...

I am pretty sure there are real 64 bit systems (though likely they are
marginal) that will fail to do your trick correctly. Not AMD64, but
some old Crays or Sparc64s might in fact fail (they need to be big
Endian, and align struct/union entries to 64 bits). And obviously
those silly DSPs that don't support int32_t's would just fail to
compile your code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,800
Messages
2,569,657
Members
45,409
Latest member
KathleneAl
Top