Reading little-endian data from a file in a portable manner

T

Tim Rentsch

Ben Bacarisse said:
Cool...

I like it -- I like them all but, on reflection, I like my own the
least. However...



I has not thought of optimisation and it's a good point. Interestingly
gcc will optimise both mine, yours and Tim's first version to a register
move and a return. I thought mine would be the worst in this respect,
but it seems not. Tim's second version (the one that does not need a
wider signed type) was too clever for gcc.

This holds both for short (16-bit) and int (32-bit) versions.
[I presume this means (unsigned short) to (short) in the first case.]

I acknowledge that these results are significant. However....
some additional points to consider:

1. These optimizations occur (I assume) on a two's complement
implementation. The code I wrote was chosen partly because
it likely behaves well on ones' complement and sign/magnitude
implementations also. How will the other methods do on
such systems?

2. Now that the discussion is taking place, it's only a matter of
time before gcc will be optimizing my second version as well. :)

3. The field being converted may not be an exact width field (eg
short or int). If the field being converted is other than
a full-width regular integer type, then (a) the shorter version
of mine is usable, and (b) the other two, both having a test in
them, will likely have a conditional branch in the generated
code, and that's likely to mean they will be slower than the
non-branching approach.

4. Again if the field being converted is not a full-width field
(of some regular integer type), then it may be important to
ensure that only the bits of interest participate in the result.
My code does that already with its AND operations; the other
two would need an extra masking AND to suppress extra bits.

Having said all that, I think all three ideas have something to be
said for them, and thank you both for presenting your cases. By
the way, related to that I have a comment for Ben's method, which
if I may be allowed I will re-write just slightly (32 bit int
version):

return ux < 0x80000000 ? (int) ux : -(int)(-ux-1) - 1;

What I find nice about this way of writing it is that the
second part (ignoring the conversion to int) is

(ux + 1) - 1

or just 'ux', which gives me a way of seeing how/why it works,
and also makes it easier to see why gcc can optimize it.
 
M

Marcin Grzegorczyk

Tim said:
Ben Bacarisse said:
Marcin Grzegorczyk said:
Tim Rentsch wrote:
<snip>
/* get the two bytes into unsigned int ux and then... */
if (ux >= 0x8000u) {
int x = 0xffffu - ux; /* 0<= x<= 32767 */
return -x - 1;
}
else return ux;
<snip>
I believe you're right, by why not just this:

/* get the two bytes into unsigned int ux and then... */
return (int)(ux & 32767) - (long)(ux & 32768);

I'd prefer not to rely on a longer type being available, especially
given the context.

Easily done --

return (int)(ux & 32767) - (int)(ux/2 & 16384) - (int)(ux/2 & 16384);
Well, here's one I've thought up (incidentally, just a few days before
the stuff above was posted):

if (ux & 0x8000) return (int)(ux - 0x8000) - 0x7FFF - 1;
else return (int)ux;

Or, if you prefer a single-expression version:

return ux & 0x8000 ? (int)(ux - 0x8000) - 0x7FFF - 1 : (int)ux;
[snip]
I has not thought of optimisation and it's a good point. Interestingly
gcc will optimise both mine, yours and Tim's first version to a register
move and a return. I thought mine would be the worst in this respect,
but it seems not. Tim's second version (the one that does not need a
wider signed type) was too clever for gcc.

Out of curiosity, I've checked all four variants with the IAR C/C++
Compiler for the National Semiconductor CR16C (icccr16, v.2.20.1) at
work (as you may guess, I work in embedded software development), with
the maximum optimization level (optimizing for size). Only my version
was optimized into a single register move instruction. A bit
disappointing, but not really surprising; I've already noticed that
icccr16 isn't very good at optimizing bitwise operations. A bit more
surprising was that Ben's version turned out to be least size-efficient:
the compiler generated not only the branch, but also code to calculate
the expression (-1 - (0xFFFF - ux)). All that despite the fact that
CR16C is a regular two's complement architecture.
I acknowledge that these results are significant. However....
some additional points to consider:

1. These optimizations occur (I assume) on a two's complement
implementation. The code I wrote was chosen partly because
it likely behaves well on ones' complement and sign/magnitude
implementations also. How will the other methods do on
such systems?

Well, without being familiar with any such architecture, I can only
speculate what instructions a compiler could use; but note that in order
to preserve the full range of possible values, the conversion would have
to be to a wider type anyway. In that case I guess (and it's a pretty
wild guess) your first version /could/ be a bit more efficient.
[...]
3. The field being converted may not be an exact width field (eg
short or int). If the field being converted is other than
a full-width regular integer type, then (a) the shorter version
of mine is usable, and (b) the other two, both having a test in
them, will likely have a conditional branch in the generated
code, and that's likely to mean they will be slower than the
non-branching approach.

Actually I did check what GCC does when the field is 16-bit but the
types are 32-bit (IOW, the expressions exactly as written above, with a
32-bit int). On the i386 target, your first version is then a register
copy, two ANDs and a subtraction; mine is a test, branch and a
subtraction. (However, when the target is i686 or later, the branch is
replaced by a conditional move.)

(By the way, branching is slower only on deeply pipelined machines,
especially ones with branch prediction hardware. In embedded systems,
where the CPUs tend to be much simpler, branches are often cheap.)

Sure, if you're writing for portability (which this whole thread is
about, after all), you're better off picking what you find most
readable, and not worrying too much about how compilers may optimize it.
In that case, I think each of the four versions can do.
4. Again if the field being converted is not a full-width field
(of some regular integer type), then it may be important to
ensure that only the bits of interest participate in the result.
My code does that already with its AND operations; the other
two would need an extra masking AND to suppress extra bits.

True. OTOH, when the field width does match the return type, the AND
becomes a no-op...
Having said all that, I think all three ideas have something to be
said for them, and thank you both for presenting your cases. By
the way, related to that I have a comment for Ben's method, which
if I may be allowed I will re-write just slightly (32 bit int
version):

return ux < 0x80000000 ? (int) ux : -(int)(-ux-1) - 1;

What I find nice about this way of writing it is that the
second part (ignoring the conversion to int) is

(ux + 1) - 1

or just 'ux', which gives me a way of seeing how/why it works,
and also makes it easier to see why gcc can optimize it.

Yes, I think you're right. I may try this one with icccr16, but I doubt
it would come out any better than the original.
 
T

Tim Rentsch

Marcin Grzegorczyk said:
Tim said:
Ben Bacarisse said:
Tim Rentsch wrote:
<snip>
/* get the two bytes into unsigned int ux and then... */
if (ux >= 0x8000u) {
int x = 0xffffu - ux; /* 0<= x<= 32767 */
return -x - 1;
}
else return ux;
<snip>
I believe you're right, by why not just this:

/* get the two bytes into unsigned int ux and then... */
return (int)(ux & 32767) - (long)(ux & 32768);

I'd prefer not to rely on a longer type being available, especially
given the context.

Easily done --

return (int)(ux & 32767) - (int)(ux/2 & 16384) - (int)(ux/2 & 16384);
<snip>

Well, here's one I've thought up (incidentally, just a few days before
the stuff above was posted):

if (ux & 0x8000) return (int)(ux - 0x8000) - 0x7FFF - 1;
else return (int)ux;

Or, if you prefer a single-expression version:

return ux & 0x8000 ? (int)(ux - 0x8000) - 0x7FFF - 1 : (int)ux; [snip]
I has not thought of optimisation and it's a good point. Interestingly
gcc will optimise both mine, yours and Tim's first version to a register
move and a return. I thought mine would be the worst in this respect,
but it seems not. Tim's second version (the one that does not need a
wider signed type) was too clever for gcc.

Out of curiosity, I've checked all four variants with the IAR C/C++
Compiler for the National Semiconductor CR16C (icccr16, v.2.20.1) at
work (as you may guess, I work in embedded software development), with
the maximum optimization level (optimizing for size). Only my version
was optimized into a single register move instruction. A bit
disappointing, but not really surprising; I've already noticed that
icccr16 isn't very good at optimizing bitwise operations. A bit more
surprising was that Ben's version turned out to be least
size-efficient: the compiler generated not only the branch, but also
code to calculate the expression (-1 - (0xFFFF - ux)). All that
despite the fact that CR16C is a regular two's complement
architecture.

Interesting, thanks for the report.

Well, without being familiar with any such architecture, I can only
speculate what instructions a compiler could use; but note that in
order to preserve the full range of possible values, the conversion
would have to be to a wider type anyway.

The same is true of two's complement implementations, which are
allowed a trap representation for the distinguished value. Also
using a wider type is necessary only if the input contains the
problematic value, which may be known not to occur.
In that case I guess (and
it's a pretty wild guess) your first version /could/ be a bit more
efficient.

Seems likely, depending of course on architecture (and compiler).

[...]
3. The field being converted may not be an exact width field (eg
short or int). If the field being converted is other than
a full-width regular integer type, then (a) the shorter version
of mine is usable, and (b) the other two, both having a test in
them, will likely have a conditional branch in the generated
code, and that's likely to mean they will be slower than the
non-branching approach.

Actually I did check what GCC does when the field is 16-bit but the
types are 32-bit (IOW, the expressions exactly as written above, with
a 32-bit int). On the i386 target, your first version is then a
register copy, two ANDs and a subtraction; mine is a test, branch and
a subtraction. (However, when the target is i686 or later, the branch
is replaced by a conditional move.)

Yes, this is what I would expect.
(By the way, branching is slower only on deeply pipelined machines,
especially ones with branch prediction hardware. In embedded systems,
where the CPUs tend to be much simpler, branches are often cheap.)

Sure, if you're writing for portability (which this whole thread is
about, after all), you're better off picking what you find most
readable, and not worrying too much about how compilers may optimize
it. In that case, I think each of the four versions can do.

Well you were the one who brought up optimization. :)

As for readability, different people have different ideas about what
"more readable" means. My own metric for readability puts a high
weight on correctness; basically, the shorter the explanation for
why it's correct the more readable it is (along that axis, which
isn't the only axis, just one of the important ones). A question
about correctness is what prompted my posting on this in the first
place.

True. OTOH, when the field width does match the return type, the AND
becomes a no-op...

Yes, didn't I start off my statement with _if_ ...? The idea
was to observe another dimension of problem variability that
could affect which approach should be chosen, _if/when those cases
are relevant_.

Yes, I think you're right. I may try this one with icccr16, but I
doubt it would come out any better than the original.

If you do please let us know the results.
 
M

Marcin Grzegorczyk

[The nesting level of quotes is becoming too high for my taste. In case
the original sub-thread is unavailable, it was about a portable way to
interpret an unsigned integer as a two's complement value and return
that value as a signed integer.

Ben Bacarisse <[email protected]> proposed:
if (ux >= 0x8000u) {
int x = 0xffffu - ux; /* 0<= x<= 32767 */
return -x - 1;
}
else return ux;

Tim Rentsch <[email protected]> proposed:
#1:
return (int)(ux & 32767) - (long)(ux & 32768);

#2:
return (int)(ux & 32767) - (int)(ux/2 & 16384) - (int)(ux/2 & 16384);

I proposed:
return ux & 0x8000 ? (int)(ux - 0x8000) - 0x7FFF - 1 : (int)ux;

(plus the same written using if-else).]

Tim said:
Interesting, thanks for the report.


The same is true of two's complement implementations, which are
allowed a trap representation for the distinguished value. Also
using a wider type is necessary only if the input contains the
problematic value, which may be known not to occur.

Good point.

[snip]
Well you were the one who brought up optimization. :)

As for readability, different people have different ideas about what
"more readable" means. My own metric for readability puts a high
weight on correctness; basically, the shorter the explanation for
why it's correct the more readable it is (along that axis, which
isn't the only axis, just one of the important ones).

Right. I brought up optimization in part because every now and then I
find that doing things such that they don't break on
non-two's-complement implementations carries a penalty not only in the
time spent designing the code, but also in the code size and execution
time. I definitely aim to code as portably as possible, but I also like
to think how my code is going to perform on the most likely target
systems. (As a matter of fact, I've never had an opportunity to compile
any code for any implementation where (INT_MIN == -INT_MAX).)

BTW, I like the "shorter explanation why it's correct" metric, even
though I do often write code that requires a fair bit of explaining :)

[snip]
If you do please let us know the results.

OK, so I did try that, and the result was no better than the original;
all that needless branching and subtractions. :-( In fact, to my
further disappointment, icccr16 generated shorter code for the test (ux
& 0x8000) than for (ux >= 0x8000) (the width of int is 16 bits on that
platform).
 
T

Tim Rentsch

Marcin Grzegorczyk said:
[snip]

I proposed:
return ux & 0x8000 ? (int)(ux - 0x8000) - 0x7FFF - 1 : (int)ux;

Just a couple of minor followup comments...
Tim said:
[snip] My own metric for readability puts a high
weight on correctness; basically, the shorter the explanation for
why it's correct the more readable it is (along that axis, which
isn't the only axis, just one of the important ones).

[snip]

BTW, I like the "shorter explanation why it's correct" metric, [snip]

Thank you sir, I can live for a month on a compliment.
OK, so I did try that, and the result was no better than the original;
all that needless branching and subtractions. :-( In fact, to my
further disappointment, icccr16 generated shorter code for the test
(ux & 0x8000) than for (ux >= 0x8000) (the width of int is 16 bits on
that platform).

This comment sparked another minor observation in my brain,
about your suggested version. If 'ux' is an unsigned type
of 16 bits, then the expression

(int)(ux - 0x8000) - 0x7FFF - 1

is the same as

(int)(ux + 0x8000) - 0x7FFF - 1

since adding or subtracting 0x8000 is the same for a 16 bit
unsigned int. Now if we ignore the cast to (int), we get

(ux + 0x8000) - 0x7FFF - 1

which is very obviously just 'ux'. The compiler you're using
is able to do this optimization just by folding the constants
together (although the compiler probably did it differently
than I did, but I think you can see what I mean).

This last identity seems like a nice way (to me anyway) of
seeing quickly why this form of doing this computation
works.
 
M

Marcin Grzegorczyk

Tim said:
Marcin Grzegorczyk said:
[snip]

I proposed:
return ux & 0x8000 ? (int)(ux - 0x8000) - 0x7FFF - 1 : (int)ux;
[snip discussion of what icccr16 does with (-(int)(-ux-1) - 1) ]
This comment sparked another minor observation in my brain,
about your suggested version. If 'ux' is an unsigned type
of 16 bits, then the expression

(int)(ux - 0x8000) - 0x7FFF - 1

is the same as

(int)(ux + 0x8000) - 0x7FFF - 1

since adding or subtracting 0x8000 is the same for a 16 bit
unsigned int. Now if we ignore the cast to (int), we get

(ux + 0x8000) - 0x7FFF - 1

which is very obviously just 'ux'. The compiler you're using
is able to do this optimization just by folding the constants
together (although the compiler probably did it differently
than I did, but I think you can see what I mean).

Yes, I have no doubt it's constant folding at work, although I presume
the actual compiler logic is that (neglecting the cast) the original
expression is folded to (ux + -0x10000), and since the result is 16-bit,
the constant wraps around to 0. (In fact, GCC in that case eliminates
all those constants even with -O0, i.e. optimizations off.)
 
T

Tim Rentsch

Marcin Grzegorczyk said:
Tim said:
Marcin Grzegorczyk said:
[snip]

I proposed:
return ux & 0x8000 ? (int)(ux - 0x8000) - 0x7FFF - 1 : (int)ux;
[snip discussion of what icccr16 does with (-(int)(-ux-1) - 1) ]
This comment sparked another minor observation in my brain,
about your suggested version. If 'ux' is an unsigned type
of 16 bits, then the expression

(int)(ux - 0x8000) - 0x7FFF - 1

is the same as

(int)(ux + 0x8000) - 0x7FFF - 1

since adding or subtracting 0x8000 is the same for a 16 bit
unsigned int. Now if we ignore the cast to (int), we get

(ux + 0x8000) - 0x7FFF - 1

which is very obviously just 'ux'. The compiler you're using
is able to do this optimization just by folding the constants
together (although the compiler probably did it differently
than I did, but I think you can see what I mean).

Yes, I have no doubt it's constant folding at work, although I presume
the actual compiler logic is that (neglecting the cast) the original
expression is folded to (ux + -0x10000), and since the result is
16-bit, the constant wraps around to 0. (In fact, GCC in that case
eliminates all those constants even with -O0, i.e. optimizations off.)

Right, I have the same presumption. The transform I mentioned
is supposed to be a way to help a human reader think of it;
for people I think it's easier to flip the -0x8000 to +0x8000
than to do the wrap at the end.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top