Why does long w = 'word' fail?

P

petertwocakes

Hi,

I'm trying to test a sequence of 4 characters from a ptr buffer
against a long, but the test fails even though I think they should
have the same value. e.g :


char word[4] = {'w', 'o', 'r', 'd'};
char *wordPtr = (char*)wordArray;
long wordLong = 'word';
long *longPtr = (long*)wordArrayPtr;
long longPtrVal = *longPtr;

Yet, in the end wordLong = 2003792484, but longPtrVal = 1685221239
Shouldn't they be same?

If not, how do test 4 character sequences in chunks like this?
(without laboriously testing each char individually)
I'm in a very large text buffer incrementing the current ptr until I
hit "word"

Thanks
 
R

Richard Tobin

Well, this could be another troll, but this time I'll give it
the benefit of the doubt.

petertwocakes said:
char word[4] = {'w', 'o', 'r', 'd'};
char *wordPtr = (char*)wordArray;
long wordLong = 'word';
long *longPtr = (long*)wordArrayPtr;
long longPtrVal = *longPtr;
Yet, in the end wordLong = 2003792484, but longPtrVal = 1685221239
Shouldn't they be same?

There's no requirement for multi-character character literals to
work in any particular way. If you look at the hex values, you
will see that it's put the characters in the opposite order from
what you're expecting. It's not portable; don't do it without
a very good reason.
If not, how do test 4 character sequences in chunks like this?
Don't.

(without laboriously testing each char individually)
I'm in a very large text buffer incrementing the current ptr until I
hit "word"

Why not use the strstr() function? It's quite likely to be implemented
efficiently.

Incidentally, if you're intending to increment wordArraryPtr and
dereference it to get a long at each position, you've got another
problem: it won't work on machines where longs have to be aligned
properly.

-- Richard
 
I

Ike Naar

char word[4] = {'w', 'o', 'r', 'd'};
char *wordPtr = (char*)wordArray;
long wordLong = 'word';
long *longPtr = (long*)wordArrayPtr;
long longPtrVal = *longPtr;
Yet, in the end wordLong = 2003792484, but longPtrVal = 1685221239

It's an endianness issue; apparently you're running the code on a
little-endian machine,
where the bytes 'w', 'o', 'r', 'd' (hex 0x77, 0x6f, 0x72, 0x64) in
a ``long'' variable are interpreted as 0x64726f77 (decimal 1685221239).
On a big-endian machine, the bytes would be interpreted as 0x776f7264
(decimal 2003792484).
If not, how do test 4 character sequences in chunks like this?
(without laboriously testing each char individually)

Use ``memcmp(&wordLong, &longPtrVal, 4)'', if that isn't too laborious
for your taste.
 
P

petertwocakes

petertwocakes   said:
char word[4] = {'w', 'o', 'r', 'd'};
char *wordPtr = (char*)wordArray;
long wordLong = 'word';
long *longPtr = (long*)wordArrayPtr;
long longPtrVal = *longPtr;
Yet, in the end wordLong = 2003792484, but longPtrVal  = 1685221239

It's an endianness issue; apparently you're running the code on a
little-endian machine,
where the bytes 'w', 'o', 'r', 'd' (hex 0x77, 0x6f, 0x72, 0x64) in
a ``long'' variable are interpreted as 0x64726f77 (decimal 1685221239).
On a big-endian machine, the bytes would be interpreted as 0x776f7264
(decimal 2003792484).
If not, how do test 4 character sequences in chunks like this?
(without laboriously testing each char individually)

Use ``memcmp(&wordLong, &longPtrVal, 4)'', if that isn't too laborious
for your taste.

Thanks Richard and Ike, of course, it was an endian issue; I learnt C
on an old Mac, pre-Intel days when this worked
Which bears out your advice not to trust it either way.

Ike, for "laborious" read run-time fast.

Richard, I appreciate your help, but why on earth woulld you suspect
this is a troll?
I'm happy to admit at not being skilled in C, but the message was
neither argumentative nor off-topic.
 
R

Richard Tobin

petertwocakes said:
Richard, I appreciate your help, but why on earth woulld you suspect
this is a troll?

It had certain characteristics common to recent trolls in this group,
viz:

- choose one of the well-known unportable features of C (in
this case, multi-character character literals);
- add in a less obvious error (in this case, the implication
of unaligned access) in the hope that the experts will miss it
in their rush to give the obvious answer.

I'm glad to see that it wasn't one.

-- Richard
 
P

petertwocakes

It had certain characteristics common to recent trolls in this group,
viz:

 - choose one of the well-known unportable features of C (in
   this case, multi-character character literals);
 - add in a less obvious error (in this case, the implication
   of unaligned access) in the hope that the experts will miss it
   in their rush to give the obvious answer.

I'm glad to see that it wasn't one.

-- Richard

Ah! :)
 
A

Andrey Tarasevich

petertwocakes said:
I'm trying to test a sequence of 4 characters from a ptr buffer
against a long, but the test fails even though I think they should
have the same value. e.g :


char word[4] = {'w', 'o', 'r', 'd'};
char *wordPtr = (char*)wordArray;
long wordLong = 'word';
long *longPtr = (long*)wordArrayPtr;
long longPtrVal = *longPtr;

Yet, in the end wordLong = 2003792484, but longPtrVal = 1685221239
Shouldn't they be same?

If not, how do test 4 character sequences in chunks like this?
(without laboriously testing each char individually)
I'm in a very large text buffer incrementing the current ptr until I
hit "word"

In a typical non-malicious implementation, the only thing you can test
the value of a multi-character character sequence of reasonable length
(fits in an 'int') against another multi-character character sequence of
reasonable length. I.e. the implementation guarantees that 'word' is
equal to 'word' and different from 'abcd'. This is the only meaningful
use of multi-character character sequences. Trying to compare a
multi-character character sequence to some other value formed in some
other way (like re-interpretation of a character array, as in your
example) is asking for trouble. It is not guaranteed to work. And it
won't work. Stop wasting your time.

In a malicious implementation all multi-character character sequences
are actually allowed to evaluate to, say, zero, meaning that formally
they can be completely useless. Fortunately, this is not normally the
case in practice.
 
S

Seebs

I'm trying to test a sequence of 4 characters from a ptr buffer
against a long, but the test fails even though I think they should
have the same value. e.g :

Because the meaning of a multiple-byte character concept is
implementation defined.
char word[4] = {'w', 'o', 'r', 'd'};
char *wordPtr = (char*)wordArray;
long wordLong = 'word';
long *longPtr = (long*)wordArrayPtr;
long longPtrVal = *longPtr;
Yet, in the end wordLong = 2003792484, but longPtrVal = 1685221239
Shouldn't they be same?

Not necessarily.
If not, how do test 4 character sequences in chunks like this?
(without laboriously testing each char individually)
I'm in a very large text buffer incrementing the current ptr until I
hit "word"

First off, learn a bit more about your implementation. On a whole lot
of modern hardware, what you're doing is going to be MUCH more expensive
than testing individual characters, because you're going to be making
unaligned accesses -- which can kill you completely or merely be slow.
If it's running at all, it's probably slow.

Secondly, there is no intrinsic right answer to the question of what order
the bytes in a long are stored. x86 systems typically have the lowest-order
bits first. You might find it more rewarding to look at the bytes in order
of 0x11223344UL. If they're 44, 33, 22, 11, then you would need to take that
into account.

But in practice: strstr(buf, "word") is quite likely to be faster than
whatever you write.

-s
 
K

Keith Thompson

petertwocakes said:
I'm trying to test a sequence of 4 characters from a ptr buffer
against a long, but the test fails even though I think they should
have the same value. e.g :


char word[4] = {'w', 'o', 'r', 'd'};
char *wordPtr = (char*)wordArray;
long wordLong = 'word';
long *longPtr = (long*)wordArrayPtr;
long longPtrVal = *longPtr;

Yet, in the end wordLong = 2003792484, but longPtrVal = 1685221239
Shouldn't they be same?

I'm a little surprised nobody else pointed out that you never declared
wordArray or wordArrayPtr. I think what you meant was:

char wordArray[4] = {'w', 'o', 'r', 'd'};
char *wordArrayPtr = (char*)wordArray;
long wordLong = 'word';
long *longPtr = (long*)wordArrayPtr;
long longPtrVal = *longPtr;

Note that the cast on the second line is unnecessary.
 
E

Edward A. Falk

petertwocakes said:
I'm trying to test a sequence of 4 characters from a ptr buffer
against a long, but the test fails even though I think they should
have the same value. e.g :


char word[4] = {'w', 'o', 'r', 'd'};
char *wordPtr = (char*)wordArray;
long wordLong = 'word';
long *longPtr = (long*)wordArrayPtr;
long longPtrVal = *longPtr;

Yet, in the end wordLong = 2003792484, but longPtrVal = 1685221239
Shouldn't they be same?

Print them out in hex, and you'll see the problem.

Hint: I guessed the answer before I tried it myself.
Hint2: I guessed (correctly) you were on an x86 system.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top