Parsing a struct with bytes

S

ssubbarayan

Dear all,
I developed the following program:

void parsebytes(unsigned char* data);

struct info
{
unsigned char day;
unsigned char month;
short year;
};

struct info info1;
struct info info2;

int
main(int argc, char *argv[])
{
info1.day=12;
info1.month=8;
info1.year=2007;

parsebytes((unsigned char*)&info1);
system("PAUSE");
return EXIT_SUCCESS;
}

void parsebytes(unsigned char* data)
{
printf("day is %d\n", data[0]);
printf("month is %d\n", data[1]);
printf("year is %d\n", ((data[2] << 8) | data[3]));
}

The above program gives proper value of 12,8 for day and month.But
year value I always get junk.What should be done to correct this and
where have I gone wrong?

Looking farward for your replies and advanced thanks,

Regards,
s.subbarayan
 
J

Jens Thoms Toerring

ssubbarayan said:
I developed the following program:
void parsebytes(unsigned char* data);
struct info
{
unsigned char day;
unsigned char month;
short year;
};
struct info info1;
struct info info2;
int
main(int argc, char *argv[])
{
info1.day=12;
info1.month=8;
info1.year=2007;
parsebytes((unsigned char*)&info1);
system("PAUSE");
return EXIT_SUCCESS;
}
void parsebytes(unsigned char* data)
{
printf("day is %d\n", data[0]);
printf("month is %d\n", data[1]);
printf("year is %d\n", ((data[2] << 8) | data[3]));
}
The above program gives proper value of 12,8 for day and month.But
year value I always get junk.What should be done to correct this and
where have I gone wrong?

There are at least two aspects that lead to problems. First
of all you can't assume that the members of a structure are
all following each other directly without any "spacing" in
between. A compiler is allowed to put as many "padding bytes"
as it want's between the members of a structure. This normally
happens due to alignement issues - some types of variables
can't start at arbitrary addresses and the compiler must make
sure that those members start at allowed addresses.

The second problem is that you make some assumptions about
the way a short int is stored in memory which might be wrong.
You assume that a short int only consists of two bytes and that
the most-significant byte is stored at a lower address than the
least-significant byte. Both assumptions can be correct on your
machine but they don't are generally correct. While a short int
requires at least two 8-bit bytes (but there are also machines
with more bits in a byte, e.g. 16 bits, so a short int may be
stored in a single byte) it can be longer. And the assumption
about the ordering of the two bytes is, assuming that two 8-bit
bytes are used, only true on big-endian machines, on many (low-
endian) machines the least-significant byte is stored at the
lower address.
Regards, Jens
 
N

Nick Keighley

I developed the following program:

void parsebytes(unsigned char* data);

struct info
{
 unsigned char day;
 unsigned char month;
 short year;

};

struct info info1;
struct info info2;

int
main(int argc, char *argv[])
{
                info1.day=12;
        info1.month=8;
        info1.year=2007;

               parsebytes((unsigned char*)&info1);
                system("PAUSE");
               return EXIT_SUCCESS;

}

void parsebytes(unsigned char* data)
{

you missed
#include <stdlib.h>
printf("day is %d\n", data[0]);
 printf("month is %d\n", data[1]);
 printf("year  is %d\n", ((data[2] << 8) | data[3]));

}

The above program gives proper value of 12,8 for day and month.But
year value I always get junk.

junk? what is junk? Was it 55,047?
What should be done to correct this

you have a struct you cast it to an array of unsigned char
and then you print the array of unsigned char. Why?

A lot of casts simply result in Undefined Behaviour
when you try and access one thing as another. The exception
is casting to unsigned char. You can cast any pointer to
an object to a pointer to unsigned char and print the unsigned chars.
But what you see is highly implementation dependent.
short is probably 2 bytes (but it doesn't have to be)
and the implementation is permitted to have them
in either order (so if I guessed the output of your
program correctly, that is how I did it).
and where have I gone wrong?

I don't know. What are you trying to do? If you wanted
to know the internal representation of that particular struct
on your particular implementation then perhaps you did
nothing wrong.
 
C

Chad

ssubbarayan said:




Dear all,
I developed the following program:
void parsebytes(unsigned char* data);
struct info
{
 unsigned char day;
 unsigned char month;
 short year;
};
struct info info1;
struct info info2;
int
main(int argc, char *argv[])
{
                info1.day=12;
 info1.month=8;
 info1.year=2007;
               parsebytes((unsigned char*)&info1);
                system("PAUSE");
               return EXIT_SUCCESS;
}
void parsebytes(unsigned char* data)
{
printf("day is %d\n", data[0]);
 printf("month is %d\n", data[1]);
 printf("year  is %d\n", ((data[2] << 8) | data[3]));
}
The above program gives proper value of 12,8 for day and month.But
year value I always get junk.

Let me guess: 55047?
What should be done to correct this and
where have I gone wrong?

Well, the best way to correct it is to have parsebytes accept a pointer to
an info1 structure, rather than a pointer to unsigned char.

You've gone wrong in assuming that the significance of the bytes going to
make up a short int is in decreasing order - i.e. that the most
significant byte comes first. On some systems, that's true. On others, it
isn't. Look up "endianness", "big-endian", "little-endian", and - would
you believe - "middle-endian".

--

How did you know that it was 55047?
 
C

Chad

That's what you get when you swap the 2 bytes of 2007.

But yet

printf("day is %d\n", data[0]);
printf("month is %d\n", data[1]);

Produced the 'correct'values. Do I dare ask why.

Chad
 
B

Bartc

Chad said:
That's what you get when you swap the 2 bytes of 2007.

But yet

printf("day is %d\n", data[0]);
printf("month is %d\n", data[1]);

Produced the 'correct'values. Do I dare ask why.

From your original post:
struct info
{
unsigned char day;
unsigned char month;
short year;
};

Presumably day and month are 1 byte each, and year is 2 bytes. It would be
difficult to get 1 byte in the wrong order.

Try switching the data[2] and data[3] in your code. (However this seems a
crazy way of accessing the fields of your date.)

(BTW it's now 2008 ...)
 
S

ssubbarayan

I developed the following program:
void parsebytes(unsigned char* data);
struct info
{
 unsigned char day;
 unsigned char month;
 short year;

struct info info1;
struct info info2;
int
main(int argc, char *argv[])
{
                info1.day=12;
        info1.month=8;
        info1.year=2007;
               parsebytes((unsigned char*)&info1);
                system("PAUSE");
               return EXIT_SUCCESS;

void parsebytes(unsigned char* data)
{

you missed
    #include <stdlib.h>
printf("day is %d\n", data[0]);
 printf("month is %d\n", data[1]);
 printf("year  is %d\n", ((data[2] << 8) | data[3]));

The above program gives proper value of 12,8 for day and month.But
year value I always get junk.

junk? what is junk? Was it 55,047?
What should be done to correct this

you have a struct you cast it to an array of unsigned char
and then you print the array of unsigned char. Why?

A lot of casts simply result in Undefined Behaviour
when you try and access one thing as another. The exception
is casting to unsigned char. You can cast any pointer to
an object to a pointer to unsigned char and print the unsigned chars.
But what you see is highly implementation dependent.
short is probably 2 bytes (but it doesn't have to be)
and the implementation is permitted to have them
in either order (so if I guessed the output of your
program correctly, that is how I did it).
and where have I gone wrong?

I don't know. What are you trying to do? If you wanted
to know the internal representation of that particular struct
on your particular implementation then perhaps you did
nothing wrong.

Hi,
Yes it prints 55047.
Regards,
s.subbarayan
 
S

ssubbarayan

printf("day is %d\n", data[0]);
 printf("month is %d\n", data[1]);
Produced the 'correct'values. Do I dare ask why.

From your original post:
struct info
{
unsigned char day;
unsigned char month;
short year;
};

Presumably day and month are 1 byte each, and year is 2 bytes. It would be
difficult to get 1 byte in the wrong order.

Try switching the data[2] and data[3] in your code. (However this seems a
crazy way of accessing the fields of your date.)

(BTW it's now 2008 ...)

Hi,
Yes it works and prints 2007,if i change data[2] and data[3] in my
code.
 
N

Nick Keighley

I developed the following program:
void parsebytes(unsigned char* data);
struct info
{
 unsigned char day;
 unsigned char month;
 short year;
};
struct info info1;
struct info info2;
int
main(int argc, char *argv[])
{
                info1.day=12;
        info1.month=8;
        info1.year=2007;
               parsebytes((unsigned char*)&info1);
                system("PAUSE");
               return EXIT_SUCCESS;
}
void parsebytes(unsigned char* data)
{
you missed
    #include <stdlib.h>
    #include <stdio.h>
at the beginning of your program
printf("day is %d\n", data[0]);
 printf("month is %d\n", data[1]);
 printf("year  is %d\n", ((data[2] << 8) | data[3]));
}
The above program gives proper value of 12,8 for day and month.But
year value I always get junk.
junk? what is junk? Was it 55,047?
you have a struct you cast it to an array of unsigned char
and then you print the array of unsigned char. Why?
A lot of casts simply result in Undefined Behaviour
when you try and access one thing as another. The exception
is casting to unsigned char. You can cast any pointer to
an object to a pointer to unsigned char and print the unsigned chars.
But what you see is highly implementation dependent.
short is probably 2 bytes (but it doesn't have to be)
and the implementation is permitted to have them
in either order (so if I guessed the output of your
program correctly, that is how I did it).
I don't know. What are you trying to do? If you wanted
to know the internal representation of that particular struct
on your particular implementation then perhaps you did
nothing wrong.

don't quote sigs (the bit after "-- ")

Hi,
Yes it prints 55047.

and so? Is that not what you wanted? I repeat, what are you
trying to do? What output do you want?
 
S

ssubbarayan

I developed the following program:
void parsebytes(unsigned char* data);
struct info
{
 unsigned char day;
 unsigned char month;
 short year;
};
struct info info1;
struct info info2;
int
main(int argc, char *argv[])
{
                info1.day=12;
        info1.month=8;
        info1.year=2007;
               parsebytes((unsigned char*)&info1);
                system("PAUSE");
               return EXIT_SUCCESS;
}
void parsebytes(unsigned char* data)
{
you missed
    #include <stdlib.h>
    #include <stdio.h>
at the beginning of your program
printf("day is %d\n", data[0]);
 printf("month is %d\n", data[1]);
 printf("year  is %d\n", ((data[2] << 8) | data[3]));
}
The above program gives proper value of 12,8 for day and month.But
year value I always get junk.
junk? what is junk? Was it 55,047?
What should be done to correct this
you have a struct you cast it to an array of unsigned char
and then you print the array of unsigned char. Why?
A lot of casts simply result in Undefined Behaviour
when you try and access one thing as another. The exception
is casting to unsigned char. You can cast any pointer to
an object to a pointer to unsigned char and print the unsigned chars.
But what you see is highly implementation dependent.
short is probably 2 bytes (but it doesn't have to be)
and the implementation is permitted to have them
in either order (so if I guessed the output of your
program correctly, that is how I did it).
and where have I gone wrong?
I don't know. What are you trying to do? If you wanted
to know the internal representation of that particular struct
on your particular implementation then perhaps you did
nothing wrong.

don't quote sigs (the bit after "-- ")
Hi,
Yes it prints 55047.

and so? Is that not what you wanted? I repeat, what are you
trying to do? What output do you want?

Hi,
I was expecting it to print 2007.But due to wrong byte swapping,it was
showing 55047 instead of 2007.
The idea behind asking this question is,I have got stream of data in
different data types and already an existing function recieves it and
parses it by bytes.I was trying to experiment with the structure and
see If I could extract it byte wise and get correct values.Incase I am
successful,I would go ahead and implement the same in our product.When
I tried this sample,I encountered the problem I reported and was
wanting to understand the reason behind it.Thanks for all your help.

Regards,
s.subbarayan
 
B

Ben Bacarisse

ssubbarayan said:
On Aug 12, 12:50 pm, Nick Keighley <[email protected]>
wrote:

Please edit out sig blocks like this.
I was expecting it to print 2007.But due to wrong byte swapping,it was
showing 55047 instead of 2007.
The idea behind asking this question is,I have got stream of data in
different data types and already an existing function recieves it and
parses it by bytes.I was trying to experiment with the structure and
see If I could extract it byte wise and get correct values.Incase I am
successful,I would go ahead and implement the same in our product.

This sounds like you have concluded that you should not use this
method, but you are 100% right. In most cases, a program that is to
read and interpret some external, binary, data format should read it
in bytes and put these together using shifts (or arithmetic). The
format of the data usually dictates which numbered byte is the least
significant and you use this to determine how to put the value back
together.

The "other" way -- where the program just takes the data an "overlays"
it onto an in-memory object is not portable. It can even break
between compiler releases if structure padding is changed, for
example. I wanted to assure you that you have probably got it right,
despite getting the wrong answer!
 
N

Nick Keighley

On Aug 11, 6:14 pm,Nick Keighley<[email protected]>
void parsebytes(unsigned char* data);
struct info
{
 unsigned char day;
 unsigned char month;
 short year;
};
struct info info1;
struct info info2;
int
main(int argc, char *argv[])
{
      info1.day=12;
      info1.month=8;
      info1.year=2007;
      parsebytes((unsigned char*)&info1);
printf("day is %d\n", data[0]);
 printf("month is %d\n", data[1]);
 printf("year  is %d\n", ((data[2] << 8) | data[3]));
}
The above program gives proper value of 12,8 for day and month.But
year value I always get junk.
junk? what is junk? Was it 55,047?
What should be done to correct this
you have a struct you cast it to an array of unsigned char
and then you print the array of unsigned char. Why?
A lot of casts simply result in Undefined Behaviour
when you try and access one thing as another. The exception
is casting to unsigned char. You can cast any pointer to
an object to a pointer to unsigned char and print the unsigned chars.
But what you see is highly implementation dependent.
short is probably 2 bytes (but it doesn't have to be)
and the implementation is permitted to have them
in either order (so if I guessed the output of your
program correctly, that is how I did it).
and where have I gone wrong?
I don't know. What are you trying to do? If you wanted
to know the internal representation of that particular struct
on your particular implementation then perhaps you did
nothing wrong.
don't quote sigs (the bit after "-- ")
and so? Is that not what you wanted? I repeat, what are you
trying to do? What output do you want?

DON'T QUOTE THE SIG!

I was expecting it to print 2007.But due to wrong byte swapping,

it's perfectly allowable byte swapping.

it was
showing 55047 instead of 2007.
The idea behind asking this question is,I have got stream of data in
different data types and already an existing function recieves it and
parses it by bytes.

ah. Now this makes sense. You are trying to convert data
in a C program into a stream of bytes and transmit it to
another machine (or save it in a file or a database) and
then read it back.

I was trying to experiment with the structure and
see If I could extract it byte wise and get correct values.Incase I am
successful,I would go ahead and implement the same in our product.When
I tried this sample,I encountered the problem I reported and was
wanting to understand the reason behind it.Thanks for all your help.

what you need to do read/write functions for the various primitives
that you support (eg. byte, int16, int32, float, string etc.).
These primitives agree on things like representation and byte
ordering (eg. always send the low order bits first) hence you read
back what you wrote. structs and arrays are serialised (put into
the stream) by calling the primitives. You can roll your own
or look at XDR. Or ASN.1 (a bit more heavy weight). Or you use text
representations. The modern way (not necessarily a compliment)
is to use XML. XML has the virtue that libraries to do this are
widely available.

Consider yourself lucky! If your experiment had "worked"
you might have implemented a technique that broke as soon
as you changed the machine at one end. I've seen stuff
like this break even when the machines were from the same
manufacturer. Try mixing a Win32 with a Win64.


--
Nick Keighley

"XML is isomorphic to the subset of Lisp data
where the first item in a list is required to be atomic."
John McCarthy
 
B

Bart

On Aug 11, 6:14 pm,Nick Keighley<[email protected]>
I developed the following program:
void parsebytes(unsigned char* data);
struct info
{
 unsigned char day;
 unsigned char month;
 short year;
};
struct info info1;
struct info info2;
int
main(int argc, char *argv[])
{
                info1.day=12;
        info1.month=8;
        info1.year=2007;
               parsebytes((unsigned char*)&info1);
                system("PAUSE");
               return EXIT_SUCCESS;
}
void parsebytes(unsigned char* data)
{
you missed
    #include <stdlib.h>
    #include <stdio.h>
at the beginning of your program
printf("day is %d\n", data[0]);
 printf("month is %d\n", data[1]);
 printf("year  is %d\n", ((data[2] << 8) | data[3]));
}
The above program gives proper value of 12,8 for day and month.But
year value I always get junk.
junk? what is junk? Was it 55,047?
What should be done to correct this
you have a struct you cast it to an array of unsigned char
and then you print the array of unsigned char. Why?
A lot of casts simply result in Undefined Behaviour
when you try and access one thing as another. The exception
is casting to unsigned char. You can cast any pointer to
an object to a pointer to unsigned char and print the unsigned chars.
But what you see is highly implementation dependent.
short is probably 2 bytes (but it doesn't have to be)
and the implementation is permitted to have them
in either order (so if I guessed the output of your
program correctly, that is how I did it).
and where have I gone wrong?
I don't know. What are you trying to do? If you wanted
to know the internal representation of that particular struct
on your particular implementation then perhaps you did
nothing wrong.
don't quote sigs (the bit after "-- ")
and so? Is that not what you wanted? I repeat, what are you
trying to do? What output do you want?
- Show quoted text -

Hi,
I was expecting it to print 2007.But due to wrong byte swapping,it was
showing 55047 instead of 2007.
The idea behind asking this question is,I have got stream of data in
different data types and already an existing function recieves it and
parses it by bytes.

In this particular case you might know the year must be, say, 1900 to
2100. Any byte swapping will give year values outside this range
(except for year 2056 which is unaffected). In that case just reverse
the two bytes when the year is not already 1900 to 2100.
 
K

Keith Thompson

Bart said:
In this particular case you might know the year must be, say, 1900 to
2100. Any byte swapping will give year values outside this range
(except for year 2056 which is unaffected). In that case just reverse
the two bytes when the year is not already 1900 to 2100.

Sure, that will work in this particular case, but it's not a great
technique in general. Tomorrow you might have to handle data values
that are still plausible after being byte-swapped.

A more general approach is either to know in advance what the byte
ordering of the input file happens to be, or, if that's not feasible,
to have something in the file with a known value that will
unambigously tell you the byte ordering.

For example, a file might contain fixed fields containing the values
0x0102 (16 bits) and 0x01020304 (32 bits). Examining the values of
those fields should tell you what adjustments you need to perform for
other 16-bit and 32-bit fields.

Note that there are byte orderings others than big-endian and
little-endian. For a 32-bit value, 0x02010403 is a possibility on
some systems (but not on any systems you're particularly likely to
encounter).

For reasonable portability, the only values you probably need to worry
about are (0x0102, 0x01020304) and (0x0201, 0x04030201), as long as
you treat any other values as an error.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,128
Latest member
ElwoodPhil
Top