hex and octal constants in various languages

R

Richard Maine

LC's No-Spam Newsreading account said:
- it is not obvious to find the relevant information in the
documentation :-(

I can't speak to whatever you are using as documentation for C, but you
mentioned MR&C as a source for Fortran. Dragging out my copy, most of
these issues look pretty clear in it to me.

The boz (binary, octal, and hex) literals are described in the section
on integer literal constants. The text says they are integers, in case
the section name wasn't enough clue. It then also says "The use of these
forms of constants is limitted to their appearance as implicit integers
in the data statement..." That seems pretty direct to me.

For the edit descriptors, B, O, and Z are all listed only in the section
on integer (pg 197) and described as for integers.

The one thing that I would agree makes some material a little hard to
find is the way that the book separates out the new f2003 material. I
quickly found the material on boz literals on page 15. That doesn't
cover the f2003 feature of allowing those forms in some intrinsics. For
that, you need to go to the chapter on "miscellaneous enhancements", in
particular 18.9, which is even high enough level to be in the table of
contents. That section also mentions a little aabout the reasons for the
limitations, while page 15 was pretty much "just the facts."

I'd suggest that the bigger thing that would make this material hard to
find would be failing to make it a habit to look. That seems to go along
with your comment elsewhere that "unless one has access to The Standard
Itself, the only way one can gain experience is testing with specific
compilers".
 
J

James Van Buskirk

The boz (binary, octal, and hex) literals are described in the section
on integer literal constants. The text says they are integers, in case
the section name wasn't enough clue. It then also says "The use of these
forms of constants is limitted to their appearance as implicit integers
in the data statement..." That seems pretty direct to me.
For the edit descriptors, B, O, and Z are all listed only in the section
on integer (pg 197) and described as for integers.
The one thing that I would agree makes some material a little hard to
find is the way that the book separates out the new f2003 material. I
quickly found the material on boz literals on page 15. That doesn't
cover the f2003 feature of allowing those forms in some intrinsics. For
that, you need to go to the chapter on "miscellaneous enhancements", in
particular 18.9, which is even high enough level to be in the table of
contents. That section also mentions a little aabout the reasons for the
limitations, while page 15 was pretty much "just the facts."

There seems to have been a sea change in F03 where boz-literal-constants
can be regarded as bit sequences as arguments to INT and REAL, as in
N1723.pdf, section 13.3.3. This usage seems more consistent with the
O.P.'s expectations.

C:\gfortran\clf\bit_sequence>type bit_sequence.f90
program test
implicit none
real x

x = real(z'3f800000',kind(x))
write(*,'(f0.1,1x,z8.8)') x, transfer(x,1)
end program test

C:\gfortran\clf\bit_sequence>gfortran -std=f2003
bit_sequence.f90 -obit_sequence


C:\gfortran\clf\bit_sequence>bit_sequence
1.0 3F800000
 
J

James Van Buskirk

LC's No-Spam Newsreading account wrote:
Did the documentation for those specifiers give you any justification for
such thinking? It should say something like the following: "The unsigned
int argument is converted to unsigned octal (o), unsigned
decimal (u), or unsigned hexadecimal notation (x or X)... " (7.19.6.1p8).
Note, in particular, that if the argument is not an unsigned int, this
description is meaningless, which is a strong hint that the argument has
to be an unsigned int.

It seems to me that the O.P. really wants what conversion to an
unsigned type would give him. Just look at what he expects for the
Fortran output. Given that, his biggest problem is not specifying
the length of the output correctly in the format string:

C:\gfortran\clf\bit_sequence>type fmt.c
#include <stdio.h>

int main()
{
short i;

i = -2;
printf("%hd %6.6ho %4.4hx\n", i, i, i);
printf("%d %6.6o %4.4x\n", i, i, i);

return 0;
}

C:\gfortran\clf\bit_sequence>gcc -std=c99 fmt.c -ofmt

C:\gfortran\clf\bit_sequence>fmt
-2 177776 fffe
-2 37777777776 fffffffe
 
K

Kaz Kylheku

James Kuyper wrote:


For what it's worth, I've seen code like this:

float f;
int *p = (int*)&f;

not work. Apparently, it's a C aliasing thing; you can't depend on
pointers of one type to point to objects of another type (gcc has an
option to change this behavior). The union is a safer way to go.

A union is not safer. Storing an int to a union and accessing a float,
or vice versa, is not defined behavior.

The portable purpose of a union is to provide compact storage for a polymorphic
data representation, whereby all accesses to the union are by means of that
member which was most recently used to store to it (or another member which has
a compatible type).
 
B

Ben Bacarisse

Kaz Kylheku said:
A union is not safer. Storing an int to a union and accessing a float,
or vice versa, is not defined behavior.

The portable purpose of a union is to provide compact storage for a polymorphic
data representation, whereby all accesses to the union are by means of that
member which was most recently used to store to it (or another member which has
a compatible type).

This rather draconian restriction has been relaxed. As a result, it
is reasonable to use a union with an array of unsigned char:

union rep {
float f;
unsigned char rep[sizeof(float)];
};

if you don't like using explicit pointers.
 
B

Ben Pfaff

Ben Bacarisse said:
As a result, it is reasonable to use a union with an array of
unsigned char:

union rep {
float f;
unsigned char rep[sizeof(float)];
};

if you don't like using explicit pointers.

I like to do this kind of thing with memcpy():

int reinterpret_float_as_int(float f)
{
int x;
assert(sizeof f == sizeof x);
memcpy(&x, &f, sizeof x);
return x;
}
 
R

Richard Maine

James Van Buskirk said:
There seems to have been a sea change in F03 where boz-literal-constants
can be regarded as bit sequences as arguments to INT and REAL,

That's the feature I alluded to earlier. I don't think I would call it a
"sea change" though. Boz-literal constants are still very restricted in
where they can appear. The new feature was a bit of a compromise in that
it does allow them outside of data statements (that limitation was
painfully restrictive), but it avoids the problems of just allowing them
in arbitrary places. The new places allowed are contexts where the
intended meaning is both unambiguous and reasonably "obvious".

There was a proposal to allow them much more liberally, but that didn't
get very far. It had the problem that there were pretty much
irresolvable conflicts between consistency and some existing extensions.
Putting them in the INT and REAL intrinsics avoided such conflicts.
There were not inconsistent existing extensions, and it allowed the
kinds of things people (such as the OP, but he is far from the only one)
want to be able to do. It does have the disadvantage of being a bit
verbose, but for that verbosity you get obviously unambiguous.

To me, a "sea change" would be allowing them anywhere and having the
interpretation determined by context.

Hmm. I suppose I'll set followups, though people can override if they
like. This particular bit seems to have minimal relevance to C or Java
other than perhaps the side note that it has some simillarities to what
appears to be allowed in them. The syntax is different, but it appears
that they also have a capability to explicitly specify that a bit
pattern is to be interpreted as a real.
 
J

jameskuyper

Louis said:
James Kuyper wrote:


For what it's worth, I've seen code like this:

float f;
int *p = (int*)&f;

not work.

Correct. However, that's not the case for unsigned char*; it's
perfectly safe to convert a pointer to any object into a pointer to
unsigned char*, and to use that pointer to access the first sizeof
(object) bytes after that pointer.

... Apparently, it's a C aliasing thing; you can't depend on
pointers of one type to point to objects of another type ...

Correct. The anti-aliasing rules are intended to allow a C compiler to
generate code which, for example, assumes without bothering to check,
that writes through a pointer to float will not interact with reads
through a pointer to int.
.. The union is a safer way to go.

The footnote to section 6.5.2.3p4 says "If the member used to access
the contents of a union object is not the same as the member last used
to store a value in the object, the appropriate part of the object
representation of the value is reinterpreted as an object
representation in the new type as described in 6.2.6 (a process
sometimes called "type punning"). This might be a trap
representation." That last sentence is the killer.
 
G

glen herrmannsfeldt

<>
<> - usage of hex constants in assignments instead of DATA ?
<> - usage of hex constants for non-integers ?
<> - usage of hex edit descriptor for non-integers ?

< Yes. All 3 of those. F2003 allows a few other places, but they are still
< pretty limitted. Other posts have mentioned the reasons. In particular,
< f2003 allows them as arguments to the real and integer intrinsics. That
< provides a standard way to use them in assignments and for reals.

I thought that tO and Z still didn't work with non-integer
(REAL, COMPLEX, and CHARACTER) data. I would expect TRANSFER to
work, transfering the bits to the appropriate integer.

-- glen
 
G

glen herrmannsfeldt

< On Mon, 15 Jun 2009, jameskuyper wrote:

<>> float f ;
<>> f=somevalue ;
<>> printf("%13.7g %11.11o %8.8X %d \n",f,f,f,sizeof(f)) ;

<> The problems with the "%o" and "%X" format specifiers in this case are
<> technically exactly the same as in the previous two cases. However,

< Hmm... what's then the "standard" way to obtain an hex dump of a float
< (in the way the Z format will do in Fortran), other than writing it to a
< binary file and using od :) ?

Copy to the appropriate sized integer using memcpy() after
casting pointers to (unsigned char*).

< I tried the above using one of the following instead of "somevalue"
< (with gcc)

< f=1.0f ; /* gives 3FF00000 instead of 3F800000 */
< f=2.0f ; /* gives 40000000 */
< f=4.0f ; /* gives 40800000 instead of 40100000 */
< f=8.0f ; /* gives 41000000 instead of 40200000 */
< C FORTRAN

I don't see how you got most of those. Note that you can't
pass (float) values to printf, as they get converted to double
along the way. You should print them with %8.8Lx, with the
assumption that (double) and (long long) are the same size.

X'3FF0000000000000' (with the appropriate endianness) is
right for IEEE double.

do i=0,10
write(*,'(Z8.8)') TRANSFER(2.0**i,1)
enddo
end

will print out the single precision powers of 2.0 in Fortran.


#include <stdio.h>
#include <math.h>
int main() {
float f;
int i,j;
for(i=0;i<=10;i++) {
f=pow(2.0,i);
memcpy((unsigned char*)&j,(unsigned char*)&f,sizeof(j));
printf("%8.8x\n",j);
}
}

Will, I believe, in standard C with the assumption that
sizeof(int)==sizeof(float).

< For 2.0 I obtain the same binary representation I obtain in Fortran, for
< the other values (note all powers of 2) I obtain discrepant values. 'm
< pretty sure (checked with my old Excel spreadsheet doing the bit-by-bit
< display of a real in IEEE and VAX form) the Fortran value is the correct
< one (IEEE).

< Is this a side effect of some signed vs unsigned thing ?

No, it is (float) vs. (double).

-- glen
 
R

Richard Maine

glen herrmannsfeldt said:
<>
<> - usage of hex constants in assignments instead of DATA ?
<> - usage of hex constants for non-integers ?
<> - usage of hex edit descriptor for non-integers ?

< Yes. All 3 of those. F2003 allows a few other places, but they are still
< pretty limitted. Other posts have mentioned the reasons. In particular,
< f2003 allows them as arguments to the real and integer intrinsics. That
< provides a standard way to use them in assignments and for reals.

I thought that tO and Z still didn't work with non-integer
(REAL, COMPLEX, and CHARACTER) data.

They don't. I think you misunderstood my response, perhaps from having
snipped slightly too much.

The above was a list of things that were nonstandard about the OP's
code. I was not saying that they were things that changed in f2003.
Indeed, nothing of this in the OP's code is any more standard in f2003
than f95. F2003 does allow hex literals in a few other contexts, but not
in quite the way done by the OP.
I would expect TRANSFER to
work, transfering the bits to the appropriate integer.

Me too.
 
B

Ben Bacarisse

glen herrmannsfeldt said:
In comp.lang.fortran LC's No-Spam Newsreading account <[email protected]> wrote:
< Hmm... what's then the "standard" way to obtain an hex dump of a float
< (in the way the Z format will do in Fortran), other than writing it to a
< binary file and using od :) ?

Copy to the appropriate sized integer using memcpy() after
casting pointers to (unsigned char*).

There is no need to cast since (modern) memcpy takes void * arguments
and the conversion is implicit as in Ben Pfaff's code example
elsewhere.

#include <stdio.h>
#include <math.h>
int main() {
float f;
int i,j;
for(i=0;i<=10;i++) {
f=pow(2.0,i);
memcpy((unsigned char*)&j,(unsigned char*)&f,sizeof(j));

You can omit the casts and the ()s round j. Of course you don't have
to, I am just saying you can in order to get a cleaner example:

memcpy((&j, &f, sizeof j);
printf("%8.8x\n",j);
}
}

<snip>
 
N

Nick Keighley

Nick Keighley wrote:

Goes 'splody.
que?


 If you're up against platform dependencies (which you
will be in the real world), you might have to resort to #if

the trouble is the "union hack" invokes Undefined Behaviour.

#if sizeof(int) == 4
#define INT4 int
#endif

#if sizeof(long) == 8
#define INT8 long
#endif

puke. I *hate* code like this

And lots more like that.  These sorts of things are needed for dealing
with IO and external file formats, ime.

no they aren't. Define your external interfaces in terms of streams
of bytes (or octets if you want to be really cool) and write code
to read and write them into structures.

Then go:

   union float_int {
     FLOAT4 f;
     INT4 i;
   } x;

Which will meet the op's request for 4 byte values.

it's possible float won't fit in *any* integer type
 
N

Nick Keighley

James Kuyper wrote:

For what it's worth, I've seen code like this:

   float f;
   int *p = (int*)&f;

not work.

so don't do it.

 Apparently, it's a C aliasing thing;  you can't depend on
pointers of one type to point to objects of another type

With the exception of unsigned char. An array of unsigned char
can be overlaid on any other type. No traps or other undefined
behviour. It's the C way of getting at the representation of
an object. Which is probably what the OP wants. I already posted
code to do this.

(gcc has an option to change this behavior).

gcc is borken
 The union is a safer way to go.

noooo!
 
D

David Thompson


<snip: wrongly assuming, among other things, that C assignment takes
the bitpattern of a hex constant to a floating-point representation>
C defines hex constants as being unsigned ints, maybe unsigned long ints,

Nit: C integer constants, unless suffixed L or LL, are the 'lowest'
type at least as high as int sufficient to contain the value.

Unless suffixed U, decimal constants mostly stay signed:
C89: signed int, signed long, unsigned long /* exception */
C99: signed int, signed long, signed long long

but hex or octal try BOTH signed and unsigned:
C89: signed int, unsigned int, signed long, unsigned long
C99: same then signed long long, unsigned long long

Thus a 'full width' hex constant, like 0xF00D0EC0 on a 32-bit system,
will come out unsigned int, but not all values will.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,172
Latest member
NFTPRrAgenncy
Top