standard doubt

R

raffamaiden

Hi all. I'm writing a program wich will write some variables to an
output file. I do something like

int a =5;
fwrite(&a, sizeof(int), 1, my_file_ptr);

This will write an int to the file pointed by my_file_ptr. But i know
that the c standard does not specify the exact size in bytes for its
primitive type, as far as i know it only specifies that and int is an
integer type that rapresents a number with a sign, but different
implementations\operating systems can have different size for an int.

So this mean that my program will write a 32 bit integer with one
implementation and a 16 bit with another implementation. This would
also mean that the file generated by the program that is running in
one implementation will not be readable in another implementation,
unless the program knows also in which implementation the instance
that generated the file was running.
That is right? I do not want such a behavior. How can i solve this?
 
T

Tom St Denis

Hi all. I'm writing a program wich will write some variables to an
output file. I do something like

int a =5;
fwrite(&a, sizeof(int), 1, my_file_ptr);

This will write an int to the file pointed by my_file_ptr. But i know
that the c standard does not specify the exact size in bytes for its
primitive type, as far as i know it only specifies that and int is an
integer type that rapresents a number with a sign, but different
implementations\operating systems can have different size for an int.

So this mean that my program will write a 32 bit integer with one
implementation and a 16 bit with another implementation. This would
also mean that the file generated by the program that is running in
one implementation will not be readable in another implementation,
unless the program knows also in which implementation the instance
that generated the file was running.
That is right? I do not want such a behavior. How can i solve this?

Serialize your data... e.g.

If you know you're using 32-bits of the data

unsigned char buf[4];
for (int x = 0; x < 4; x++) {
buf[0] = val & 0xFF;
val >>= 8;
}
outlen = fwrite(buf, 1, 4, outfile);

Of course smarter would be to write a function to store 32-bit ints to
a FILE then just call it when needed...

Tom
 
M

Marc Boyer

Le 10-12-2010 said:
That is right?

Yes.
I do not want such a behavior. How can i solve this?

You have to define your file format. They are severals...
You can choose text-oriented, or binary ones.
There is no 'best' solution. It depends on you needs.

Marc Boyer
 
T

Tim Prince

Hi all. I'm writing a program wich will write some variables to an
output file. I do something like

int a =5;
fwrite(&a, sizeof(int), 1, my_file_ptr);

This will write an int to the file pointed by my_file_ptr. But i know
that the c standard does not specify the exact size in bytes for its
primitive type, as far as i know it only specifies that and int is an
integer type that rapresents a number with a sign, but different
implementations\operating systems can have different size for an int.

So this mean that my program will write a 32 bit integer with one
implementation and a 16 bit with another implementation.

How about int32_t ?
 
R

raffamaiden

First off, thanks for the answers.
 You have to define your file format. They are severals...
 You can choose text-oriented, or binary ones.
 There is no 'best' solution. It depends on you needs.

I want to use a binary file format because i feel the final file will
be smaller and doesn't require atoi() and other stuff to retrieve the
data.
An integer should be stored as 32-bit.
Serialize your data... e.g.
If you know you're using 32-bits of the data

unsigned char buf[4];
for (int x = 0; x < 4; x++) {
buf[0] = val & 0xFF;
val >>= 8;
}

outlen = fwrite(buf, 1, 4, outfile);

Of course smarter would be to write a function to store 32-bit ints to
a FILE then just call it when needed...

I like this solution, but I have few more questions: Is 'char'
guaranteed to be exactly 1 byte by the standard? Because if not that
would not make sense.

Also another question: in your example my integer, for which i use
only the last 32-bits whatever its size in memory is, would be in the
variable "val". How about encoding? Does the C standard (let it be
C90) specify how an integer should be encoded in memory, or not? If
not, suppose i'm running my program in two implementations A and B. A
use two's complement encoding, while B use sign and magnitude. So if A
save the integer with the above c code, it will save exactly 32 bits,
but B will recognize another number because it use another encoding in
the same 32-bits.

Also, shouldn't buf[0] be buf[x]?
 
M

Mark Storkamp

raffamaiden said:
Hi all. I'm writing a program wich will write some variables to an
output file. I do something like

int a =5;
fwrite(&a, sizeof(int), 1, my_file_ptr);

This will write an int to the file pointed by my_file_ptr. But i know
that the c standard does not specify the exact size in bytes for its
primitive type, as far as i know it only specifies that and int is an
integer type that rapresents a number with a sign, but different
implementations\operating systems can have different size for an int.

So this mean that my program will write a 32 bit integer with one
implementation and a 16 bit with another implementation. This would
also mean that the file generated by the program that is running in
one implementation will not be readable in another implementation,
unless the program knows also in which implementation the instance
that generated the file was running.
That is right? I do not want such a behavior. How can i solve this?

As others have said, the better solution may be to define the format of
your file handle all reasonable variations. I recently took another
approach when I needed to work with the very poorly designed .stl file
format. I needed to have 2 byte unsigned, 4 byte unsigned, 4 byte floats
and 50 byte structures, and I needed to compile and run on Windows, Mac
and Unix. At the start of the program I have lines such as:

assert(sizeof(unsigned) == 4);

Then if the asserts fail, I can adjust compiler switches in my makefile
accordingly.
 
M

Marc Boyer

Le 10-12-2010 said:
First off, thanks for the answers.


I want to use a binary file format because i feel the final file will
be smaller and doesn't require atoi() and other stuff to retrieve the
data.
An integer should be stored as 32-bit.

OK
Serialize your data... e.g.
If you know you're using 32-bits of the data

unsigned char buf[4];
for (int x = 0; x < 4; x++) {
buf[0] = val & 0xFF;
val >>= 8;
}

outlen = fwrite(buf, 1, 4, outfile);

Of course smarter would be to write a function to store 32-bit ints to
a FILE then just call it when needed...

I like this solution, but I have few more questions: Is 'char'
guaranteed to be exactly 1 byte by the standard? Because if not that
would not make sense.

It is not guaranteed, but it is very common.
You have other issues, stricly looking at the standart, like
the endianness (but the given code should be robust), or the
encoding of signed values (2-complement is not the only one).
But, considering 'common' architectures, you can assume
that char are 8-bits long, and 2-complement. Moreover,
the value CHAR_BIT gives you the number of bits of a char.

Marc Boyer
 
K

Kenny McCormack

raffamaiden said:
So this mean that my program will write a 32 bit integer with one
implementation and a 16 bit with another implementation. This would

The standard does not guarantee the existence of an implementation that
has 32 bit integers, nor of one that has 16 bit integers. So, you can't
be sure of this (what you claim in the quoted paragraph) - based solely
on the standard.

And of course, that (the standard) is all that matters in this newsgroup.

--
"The anti-regulation business ethos is based on the charmingly naive notion
that people will not do unspeakable things for money." - Dana Carpender

Quoted by Paul Ciszek (pciszek at panix dot com). But what I want to know
is why is this diet/low-carb food author doing making pithy political/economic
statements?

Nevertheless, the above quote is dead-on, because, the thing is - business
in one breath tells us they don't need to be regulated (which is to say:
that they can morally self-regulate), then in the next breath tells us that
corporations are amoral entities which have no obligations to anyone except
their officers and shareholders, then in the next breath they tell us they
don't need to be regulated (that they can morally self-regulate) ...
 
M

Morris Keesan

I like this solution, but I have few more questions: Is 'char'
guaranteed to be exactly 1 byte by the standard?

Yes, but it doesn't guarantee that "byte" means what you think it does.
The standard requires a byte to be *at least* 8 bytes. It doesn't forbid
C from being implemented on architectures where the smallest addressable
unit of memory (or disk) is, e.g., 64 bits, nor does it place any
requirement on the size of a "byte" in implementations for that
architecture. I have used (and maintained the C compiler for) a machine
with 10-bit bytes, 10-bit chars, 20-bit ints.
 
S

Seebs


Hi!

You should be aware that the word "doubt", in English, has the connotation
that you were told something but disbelieve it. If you want to express
more general uncertainty, or mere lack of information, use a different
word. "Question" would probably be the best choice for something like
this, because you're asking a question.
int a =5;
fwrite(&a, sizeof(int), 1, my_file_ptr);
This will write an int to the file pointed by my_file_ptr. But i know
that the c standard does not specify the exact size in bytes for its
primitive type, as far as i know it only specifies that and int is an
integer type that rapresents a number with a sign, but different
implementations\operating systems can have different size for an int.
Yes.

So this mean that my program will write a 32 bit integer with one
implementation and a 16 bit with another implementation. This would
also mean that the file generated by the program that is running in
one implementation will not be readable in another implementation,
unless the program knows also in which implementation the instance
that generated the file was running.
Yes.

That is right? I do not want such a behavior. How can i solve this?

By writing something other than raw binary native types. One option would
be to pick a standard textual representation; if performance and file
size aren't a big deal, this is almost always the best choice, because it's
easy to read and debug. If that doesn't work, there are a large number
of options out there. You might find it instructive to look at code for
something like the TIFF image file format, which is quite successfully
portable across a broad range of machines.

-s
 
S

Seebs

I like this solution, but I have few more questions: Is 'char'
guaranteed to be exactly 1 byte by the standard? Because if not that
would not make sense.

Well, the good news is, yes, 'char' is defined to be exactly 1 byte.

The bad news is, that's because the standard defines the word "byte" to
mean "the size of a char". It does *not* guarantee that either byte or
char means 8 bits exactly.
Also another question: in your example my integer, for which i use
only the last 32-bits whatever its size in memory is, would be in the
variable "val". How about encoding? Does the C standard (let it be
C90) specify how an integer should be encoded in memory, or not?
No.

If
not, suppose i'm running my program in two implementations A and B. A
use two's complement encoding, while B use sign and magnitude. So if A
save the integer with the above c code, it will save exactly 32 bits,
but B will recognize another number because it use another encoding in
the same 32-bits.

Not with the code given.
Also, shouldn't buf[0] be buf[x]?

Yes.

The key is that "& 0xFF" always gives you the bottom 8 bits of value,
regardless of representation. So if you start out with a number
which has the value 0x12345678, it doesn't matter whether that's stored
in memory as { 12, 34, 56, 78 } or { 78, 56, 34, 12 }. Either way,
val & 0xFF will be 0x78, and val >>=8 will convert it to 0x123456, and
the next loop will get the 0x56.

-s
 
S

Seebs

Yes, but it doesn't guarantee that "byte" means what you think it does.
The standard requires a byte to be *at least* 8 bytes.

Bits.

(Obvious in context, to be sure.)

-s
 
I

Ian Collins

On 12/11/10 10:05 AM, Keith Thompson wrote:

Wow, Keith's posting from the future!
If your implementation has<stdint.h>, you might be better
off using uint32_t rather than unsigned. And if it doesn't,
there are ways to define it yourself; see, for example,
<http://www.lysator.liu.se/c/q8/index.html>.

Note also that sizeof(unsigned)==4 could be true on a system with 64-bit
unsigned and 16-bit char (though this is unlikely). You might add

assert(CHAR_BIT == 8);

or, since CHAR_BIT is a compile-time constant:

#if CHAR_BIT != 8
#error "CHAR_BIT != 8"
#endif

sizeof(unsigned) or sizeof(anything) is also a compile time constant, so
it can be used in compile time checks:

const unsigned test = 1/(sizeof(long) == 4);
 
N

Nobody

So this mean that my program will write a 32 bit integer with one
implementation and a 16 bit with another implementation. This would
also mean that the file generated by the program that is running in
one implementation will not be readable in another implementation,
unless the program knows also in which implementation the instance
that generated the file was running.
That is right? I do not want such a behavior. How can i solve this?

Aside from the issue of the precise format: if a system with 32-bit
integers writes an integer larger than 16 bits to the file, what are you
going to do when reading the file on a system with 16-bit integers?

Or if a system with 32-bit two's complement integers writes -2147483648 to
the file, what are you going to do when reading the file on a system using
sign-bit representation, where the most negative representable integer is
-2147483647?

Sometimes, it's simply not worth the trouble of accomodating anything
beyond "typical" systems. If you assume 32-bit two's complement integers,
your code will work on 99.99% of systems in current use. Additionally
assuming little-endian representation won't reduce that by much.

It's almost impossible to write a non-trivial program using nothing beyond
the C standard, so any new platform will require some degree of porting.
Assuming common behaviour simply means that porting to "unusual" platforms
will require more work *if and when* you actually port to such platforms.

BTW: a more significant issue than either word size or endianness is
alignment. Assuming support for unaligned reads will result in code which
doesn't work on many ARM CPUs, and there are more of those in use than
x86.
 
K

Keith Thompson

raffamaiden said:
First off, thanks for the answers.


I want to use a binary file format because i feel the final file will
be smaller and doesn't require atoi() and other stuff to retrieve the
data.

If you use a binary file format, either you'll have to define the
exact format (and translate to and from that format when accessing
the file), or you'll have to give up on being able to read the file
on other systems. Straight binary (fwrite'ing structs directly,
for example) can make sense for files that will be used *only* by the
same program on the same system.

Text is far more portable, and you may find that the space and time
overhead of using text rather than binary isn't that much of an issue.
An integer should be stored as 32-bit.

Why? I'm not saying you're wrong, but why 32 bits in particular?

See <stdint.h> for definitions of types of particular sizes. uint32_t
might be the best thing for your purposes, at least if you don't need
negative values.
Serialize your data... e.g.
If you know you're using 32-bits of the data

unsigned char buf[4];
for (int x = 0; x < 4; x++) {
buf[0] = val & 0xFF;
val >>= 8;
}

outlen = fwrite(buf, 1, 4, outfile);

Of course smarter would be to write a function to store 32-bit ints to
a FILE then just call it when needed...

I like this solution, but I have few more questions: Is 'char'
guaranteed to be exactly 1 byte by the standard? Because if not that
would not make sense.

As others have said, C defines a "byte" as the size of a char object,
which is *at least* 8 bits. (You'll see the word "byte" with other
meanings in other contexts.) If you're not dealing with DSPs and
embedded systems, you can probably get away with assuming that a byte is
8 bits -- but I suggest making the assumption explicit:

#include <limits.h>
#if CHAR_BIT != 8
#error "CHAR_BIT != 8"
#endif
/* Now we can safely assume that bytes are 8 bits.
Also another question: in your example my integer, for which i use
only the last 32-bits whatever its size in memory is, would be in the
variable "val". How about encoding? Does the C standard (let it be
C90) specify how an integer should be encoded in memory, or not? If
not, suppose i'm running my program in two implementations A and B. A
use two's complement encoding, while B use sign and magnitude. So if A
save the integer with the above c code, it will save exactly 32 bits,
but B will recognize another number because it use another encoding in
the same 32-bits.

C permits signed integers to be stored in 2's-complement,
1s'-complement, or sign-and-magnitude. (That's C99; C90 was
less specific, but I don't think you'll find an implementation
that uses anything else.) The vast majority of modern systems
use 2's-complement. but if you only write unsigned values to
files you can avoid that. If you need negative integers, you can
either define your own file format or just assume a 2's-complement
representation; the latter is less portable, but unlikely to be a
problem in practice.

Byte order is another issue; google "endianness" for more
information. POSIX provides byte-order conversion functions
(htonl et al); depending on POSIX further reduces portability,
but not drastically so.

[...]
 
K

Keith Thompson

Mark Storkamp said:
As others have said, the better solution may be to define the format of
your file handle all reasonable variations. I recently took another
approach when I needed to work with the very poorly designed .stl file
format. I needed to have 2 byte unsigned, 4 byte unsigned, 4 byte floats
and 50 byte structures, and I needed to compile and run on Windows, Mac
and Unix. At the start of the program I have lines such as:

assert(sizeof(unsigned) == 4);

Then if the asserts fail, I can adjust compiler switches in my makefile
accordingly.

If your implementation has <stdint.h>, you might be better
off using uint32_t rather than unsigned. And if it doesn't,
there are ways to define it yourself; see, for example,
<http://www.lysator.liu.se/c/q8/index.html>.

Note also that sizeof(unsigned)==4 could be true on a system with 64-bit
unsigned and 16-bit char (though this is unlikely). You might add

assert(CHAR_BIT == 8);

or, since CHAR_BIT is a compile-time constant:

#if CHAR_BIT != 8
#error "CHAR_BIT != 8"
#endif
 
K

Keith Thompson

Nobody said:
Aside from the issue of the precise format: if a system with 32-bit
integers writes an integer larger than 16 bits to the file, what are you
going to do when reading the file on a system with 16-bit integers?

Or if a system with 32-bit two's complement integers writes -2147483648 to
the file, what are you going to do when reading the file on a system using
sign-bit representation, where the most negative representable integer is
-2147483647?

Good points.
Sometimes, it's simply not worth the trouble of accomodating anything
beyond "typical" systems. If you assume 32-bit two's complement integers,
your code will work on 99.99% of systems in current use. Additionally
assuming little-endian representation won't reduce that by much.

If you assume that *some* predefined signed integer type is 32-bit two's
complement, that's probably ok for the vast majority of current
(non-embedded) systems. Assuming that "int" is such a type is unwise
and unnecessary.

I certainly wouldn't assume little-endian representation for anything to
be shared with different systems. x86 happens to be dominant today, but
there's not guarantee that it always will be; there are still a
significant number of SPARC systems out there. And it's a solvable
problem anyway; you don't *have* to depend on a particular endianness.
(This is why "network byte order" exists.)
It's almost impossible to write a non-trivial program using nothing beyond
the C standard, so any new platform will require some degree of
porting.

It depends on what you're doing. If you're just reading and writing
files, you really don't need to rely on anything beyond the C standard.
Assuming common behaviour simply means that porting to "unusual" platforms
will require more work *if and when* you actually port to such platforms.

BTW: a more significant issue than either word size or endianness is
alignment. Assuming support for unaligned reads will result in code which
doesn't work on many ARM CPUs, and there are more of those in use than
x86.

Alignment is an issue only for in-memory data; it's irrelevant for
reading and writing files.
 
K

Keith Thompson

Ian Collins said:
On 12/11/10 10:05 AM, Keith Thompson wrote:
Wow, Keith's posting from the future!

You'll love the flying cars!

[...]
sizeof(unsigned) or sizeof(anything) is also a compile time constant,
so it can be used in compile time checks:

const unsigned test = 1/(sizeof(long) == 4);

True -- but it's not visible to the preprocessor, so you can't use
it in #if.

(Back in 1998 in comp.std.c, somebody remarked that it was nice back in
the days when sizeof could be used in #if directives. A followup said
"Must have been before my time". The followup was from Dennis Ritchie.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top