different struct sizes

Richard Bos · Nov 7, 2006

=?utf-8?B?SGFyYWxkIHZhbiBExLNr?= said:
Well yeah, but the is* functions are never (counterexamples are
welcome) useful for binary data,

I can think of only one reason: a strings-like utility program. That's
system-level enough not to make this a problem.

Richard

Jordan Abel · Nov 7, 2006

2006-11-07 said:
I can think of only one reason: a strings-like utility program. That's
system-level enough not to make this a problem.

How is strings system-level?

#define N 4
main(int argc, char **argv) {
int c;
int i=0;
char buf[N];
FILE *myfile = ...;
while(c = getc(file) != EOF) {
if(isprint(c) || isspace(c))
if(i<N) {
buf[i++]=c;
if(i==N) { i++; fwrite(buf,1,N,stdout); }
} else {
putchar(c);
}
}
else {
if(i==N+1) {
putchar('\n');
i=0;
}
}
}
putchar('\n');
}

Richard Tobin · Nov 7, 2006

Keith Thompson said:
In this newsgroup, I can and I do expect words defined in the C
standard to be used in accordance with the way the C standard defines
them.

Well, you're often going to be disappointed.

The newsgroup isn't the *only* context determining how a word should
be interpreted. Usenet isn't a standards document, it's a
conversation.

Anyone using the word "byte" here in some other sense needs to
say so.

The statement was:

sizeof(char) is 1 by definition, that doesn't mean that it's one byte

And he said it in the context of a up-thread statement

Certain platforms use 4 bytes memory for "char" and "short int"
variables

Why not start by assuming that the person is talking sense? (Of
course, you may have to change your assumption later.) Since "byte"
corresponds to "char" in C standards-speak, it would make no sense for
him to say "that doesn't mean that it's one byte" if he had meant the
C-standards-speak usage of "byte". Assuming that he is talking sense
removes the ambiguity, and your response therefore seems excessively
pedantic.

-- Richard

Richard Tobin · Nov 7, 2006

Even if stdio support is provided, I don't think it actually matters very
much about the EOF issue, in practical terms. Yes, 0xFFFF (or 0xFFFFFFFF,
or however many bits you're dealing with) can be seen as either EOF or a
character value, but (a) there's no problem when CHAR_BIT < 16, and (b)
when CHAR_BIT /is/ 16 or higher, the chances of real world data containing
a genuine character with the maximum possible value are pretty low. You'd
have to work pretty hard to find a counter-example, I think.

And anyone designing such a character set today would be nuts.
Unicode for example makes 0xFFFF be an explicit non-character,
because of its likely use for purposes such as EOF.

-- Richard

pete · Nov 7, 2006

Richard said:
Well, you're often going to be disappointed.

The newsgroup isn't the *only* context determining how a word should
be interpreted. Usenet isn't a standards document, it's a
conversation.

And he said it in the context of a up-thread statement

Certain platforms use 4 bytes memory for "char" and "short int"
variables

Why not start by assuming that the person is talking sense?

Because it makes no sense.

N869
6.5.3.4 The sizeof operator
The sizeof operator yields the size (in bytes) of its operand

Keith Thompson · Nov 7, 2006

Well, you're often going to be disappointed.

I often am, but somehow I muddle through.

The newsgroup isn't the *only* context determining how a word should
be interpreted. Usenet isn't a standards document, it's a
conversation.

And he said it in the context of a up-thread statement

Certain platforms use 4 bytes memory for "char" and "short int"
variables

Why not start by assuming that the person is talking sense? (Of
course, you may have to change your assumption later.) Since "byte"
corresponds to "char" in C standards-speak, it would make no sense for
him to say "that doesn't mean that it's one byte" if he had meant the
C-standards-speak usage of "byte". Assuming that he is talking sense
removes the ambiguity, and your response therefore seems excessively
pedantic.

If we're not going to use the C standard's definition of "byte" in
comp.lang.c, where are we going to use it? The point of defining
terms is to have a common vocabulary so we can discuss things without
talking past each other.

I understand that the word "byte" is often used differently outside
the context of the C programming language. People these days often
use it as a synonym for "octet", though that's inconistent with the
original meaning of the word (which predates C).

The statement was:

sizeof(char) is 1 by definition, that doesn't mean that it's one byte

but that's exactly what it means, since sizeof yields the size of its
operand *in bytes*. That's not pedantry, it's simple correctness.

The alternative is to explicitly qualify every usage of the word
"byte" with either "(meaning 8 bits)", or "(in the sense defined in
the C standard)", or whatever, and to do the same for every other
technical word that may have more than one meaning.

Mark L Pappin · Nov 7, 2006

If it is known that a program's input is not text but (possibly)
binary, the wise programmer doesn't use fgetc() in the first
place. He uses fread(),

Or, checks feof() after fgetc() returns something that equals EOF.

mlp

Guest · Nov 7, 2006

Mark said:
Or, checks feof() after fgetc() returns something that equals EOF.

Don't forget to call ferror() too in that case.

Mark McIntyre · Nov 7, 2006

Important distinction. When I ported lcc-win32 to a DSP, each
character took two bytes (16 bits)

Strictly speaking, it took two OCTETS.

because the machine could not
address odd bytes.
octets...

Still, sizeof(char) was 1 of course.

And it was still one byte, I'm afraid.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Gordon Burditt · Nov 8, 2006

Even if stdio support is provided, I don't think it actually matters very

much about the EOF issue, in practical terms. Yes, 0xFFFF (or 0xFFFFFFFF,
or however many bits you're dealing with) can be seen as either EOF or a
character value, but (a) there's no problem when CHAR_BIT < 16, and (b)
when CHAR_BIT /is/ 16 or higher, the chances of real world data containing
a genuine character with the maximum possible value are pretty low. You'd
have to work pretty hard to find a counter-example, I think.

The chances of real-world data containing any particular unusual
combination of data that makes the program malfunction is pretty
much certainty if there is anything to be gained by writing MALICIOUS
code and the program accepts data from the real world.

Read some Microsoft security reports (viruses are not limited to
Microsoft but their code is the biggest target due to market share).
What are the chances of some of the exploits showing up BY ACCIDENT?
Pretty much zero. What are the chances of the exploits showing up
MALICIOUSLY? Nearly certainty.

santhoshkumarbss · Nov 8, 2006

Christopher said:
Any platform that uses 4 bytes of memory for "char" variables is
non-conforming. sizeof(char) is 1.

Old Wolf · Nov 8, 2006

Keith said:
A C implementation *must* allow char objects to be stored at odd byte
addresses. It can choose to align all single declared char objects,
or even char struct members, at even addresses if that makes access
easier or faster, but there can be no padding between array elements:

char arr[2];
/* either arr[0] or arr[1] is at an odd byte address */

If the hardware doesn't allow this (or makes it too expensive), then
the implementation can make bytes bigger than 8 bits.

How do you define "odd byte address" ?

I don't see why you can't have 16-bit bytes, at addresses 0, 2, 4, ...

Richard Heathfield · Nov 8, 2006

Old Wolf said:

Keith said:
Keith said:

A C implementation *must* allow char objects to be stored at odd byte
addresses. It can choose to align all single declared char objects,
or even char struct members, at even addresses if that makes access
easier or faster, but there can be no padding between array elements:

char arr[2];
/* either arr[0] or arr[1] is at an odd byte address */

If the hardware doesn't allow this (or makes it too expensive), then
the implementation can make bytes bigger than 8 bits.

Click to expand...

How do you define "odd byte address" ?

I don't see why you can't have 16-bit bytes, at addresses 0, 2, 4, ...

16-bit bytes are not a problem, but of course they will be at addresses 0,
1, 2, 3...

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: normal service will be restored as soon as possible. Please do not
adjust your email clients.

Ben Pfaff · Nov 8, 2006

Richard Heathfield said:
Old Wolf said:

16-bit bytes are not a problem, but of course they will be at addresses 0,
1, 2, 3...

I don't think that a strictly conforming program could tell the
difference. The representation of pointers doesn't have to be
flat, and there's no reason that subtraction of pointers to char
can't divide by 2. (Is there?)

Richard Heathfield · Nov 8, 2006

Ben Pfaff said:

I don't think that a strictly conforming program could tell the
difference. The representation of pointers doesn't have to be
flat, and there's no reason that subtraction of pointers to char
can't divide by 2. (Is there?)

If you have a pointer to the first character in a string, and a pointer to
the second character in the same string, those pointers are required to
differ by 1 (because arrays are stored contiguously, and sizeof(char) is
guaranteed to be 1). In an architecture where the concepts of "even
address" and "odd address" are meaningful, it seems to me that one or other
of those pointers *must* store an odd address.

What am I missing?

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: normal service will be restored as soon as possible. Please do not
adjust your email clients.

Ben Pfaff · Nov 8, 2006

Richard Heathfield said:
Ben Pfaff said:

If you have a pointer to the first character in a string, and a pointer to
the second character in the same string, those pointers are required to
differ by 1 (because arrays are stored contiguously, and sizeof(char) is
guaranteed to be 1).

We may be talking past one another. I agree that the result of
subtracting pointers must be 1 in this case. But the numbers
that represent the pointers involved in the subtraction could
differ by 2, or by 4, or by 623.

In an architecture where the concepts of "even
address" and "odd address" are meaningful, it seems to me that one or other
of those pointers *must* store an odd address.

I don't think the concepts of even and odd addresses would be
meaningful in the architecture that I'm envisioning.
Potentially, all pointers could be represented by odd numbers, or
by even numbers, or the bit with value 1 could be the parity of
the rest of the bits, or whatever.

Keith Thompson · Nov 8, 2006

Old Wolf said:
Keith said:

A C implementation *must* allow char objects to be stored at odd byte
addresses. It can choose to align all single declared char objects,
or even char struct members, at even addresses if that makes access
easier or faster, but there can be no padding between array elements:

char arr[2];
/* either arr[0] or arr[1] is at an odd byte address */

If the hardware doesn't allow this (or makes it too expensive), then
the implementation can make bytes bigger than 8 bits.

Click to expand...

How do you define "odd byte address" ?

I don't see why you can't have 16-bit bytes, at addresses 0, 2, 4, ...

Actually, I don't define "odd byte address" -- but the standard does
implicitly use the concept without defining it. C99 3.2 defines the
term "alignment":

alignment

requirement that objects of a particular type be located on
storage boundaries with addresses that are particular multiples of
a byte address

Strictly speaking, 0, 2, 4, ... are not addresses; they're integers.
(void*)0, (void*)2, (void*)4, ... are addresses, but the standard
tells us nothing about their representation. On some systems I've
used (Cray vector systems), machine addresses point to 64-bit words,
but CHAR_BIT==8, and void* and char* have extra offset information.
An pointer object whose representation looks like an odd integer might
point to what I'd call an even byte address, and an pointer object
whose representation looks like an even integer might point to what
I'd call an odd byte address.

My assumption is that if p is an even address, then p+1 is an odd
address (assuming p is a character pointer), regardless of the
representation; I think that's the only model that's consistent with
the standard's concept of "alignment".

Richard Heathfield · Nov 8, 2006

Ben Pfaff said:

We may be talking past one another.

Always possible. But we seem to be converging now.

I agree that the result of
subtracting pointers must be 1 in this case. But the numbers
that represent the pointers involved in the subtraction could
differ by 2, or by 4, or by 623.

Pathologically, yes, I suppose you're correct. At least, my instinct says
it's complete nonsense, but I don't see any conforming way to establish a
mapping between the object representation of a pointer and the actual
memory address that it represents. And so my clc experience contradicts my
instincts, and tells me that it's time to fold.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: normal service will be restored as soon as possible. Please do not
adjust your email clients.

Keith Thompson · Nov 9, 2006

Ben Pfaff said:
We may be talking past one another. I agree that the result of
subtracting pointers must be 1 in this case. But the numbers
that represent the pointers involved in the subtraction could
differ by 2, or by 4, or by 623.

[...]

What "numbers that represent the pointers" are you referring to?
Pointers aren't necessarily represented as numbers, and the results of
conversions between pointer and integer types are
implementation-defined.

But see my discussion of "alignment" elsethread.

Ben Pfaff · Nov 9, 2006

Keith Thompson said:
Ben Pfaff said:

We may be talking past one another. I agree that the result of
subtracting pointers must be 1 in this case. But the numbers
that represent the pointers involved in the subtraction could
differ by 2, or by 4, or by 623.

Click to expand...

[...]

What "numbers that represent the pointers" are you referring to?
Pointers aren't necessarily represented as numbers, and the results of
conversions between pointer and integer types are
implementation-defined.

Every byte of memory represents a number. Because a pointer is
made out of bytes, it can also be said to be represented by a
number, formed by concatenating bits. For many implementation,
this number is meaningful.

Why struct not globally changed in function?	1	Aug 22, 2023
Struct Member Variable Problems	1	Jun 21, 2023
Same struct, different sizes.	4	Mar 9, 2009
using libjpeg.a : "Output file write error"	7	Dec 23, 2009
struct and pointer question	33	Sep 14, 2012
padding struct	3	Jul 27, 2012
Taking address of struct temporary	5	Jul 19, 2012
struct padding is slower than struct packing	13	May 3, 2013

different struct sizes

Richard Bos

Jordan Abel

Richard Tobin

Richard Tobin

pete

Keith Thompson

Mark L Pappin

Guest

Mark McIntyre

Gordon Burditt

santhoshkumarbss

Old Wolf

Richard Heathfield

Ben Pfaff

Richard Heathfield

Ben Pfaff

Keith Thompson

Richard Heathfield

Keith Thompson

Ben Pfaff

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads