Using exact-size structs to go thru raw byte buffers

T

toe

Assume we're working on a system where CHAR_BIT == 8.

Let's say we have a raw byte buffer in memory:

char unsigned data[112];

Within this buffer is data that you got from your network card, an
ethernet frame to be exact. An ethernet frame is laid out as follows:

First 6 octets: Destination MAC address
Second 6 octets: Source MAC address
Next two octets: Protocol

In order to analyse the ethernet frame, I was thinking that maybe we
could make an exact-size struct as follows:

struct FrameHeader {
uint8 dest[6],src[6];
uint16 proto;
};

(I realise that we'd need a special compiler that will allow us to
specify no padding between members. Also I realise we'd have to be
careful about alignment).

And then do the following:

if ( 0x800 == ((struct FrameHeader const*)data)->proto )
puts("Contains an IP packet");

So far, I believe we have two issues:
1) The alignment of "proto"
2) The byte order of "proto"

Firstly, to get around the byte order issue, I was thinking of
changing the structure to:

struct FrameHeader {
uint8 dest[6],src[6];
uint8 proto[2];
}

And then making a macro function to turn a "uint8[2]" into a "uint16"
using BigEndian:

#define OCTETS_TO_16(p) ( (uint16)*(p) << 8 | (p)[1] )

so that we could do:

if ( 0x800 == OCTETS_TO_16( ((struct FrameHeader const*)data)-
proto ) )puts("Contains an IP packet");

Does this sound good?

The program that's being written is a network protocol analyser. I
myself am not writing it, but I've been asked to give a little advice.
The program is being written for MS Windows, but since the person's
using a cross-platform library for networking, I think they might try
get it to compile for Linux and Mac aswell.

On these three OS's, is there any alignment requirements for integer
types, or will the program crash if we try to access a mis-aligned
integer?

Also, is endianess determined by the CPU, or is determined by the OS?
Does anyone know what the endianesses are for the common CPU's and
OS's?

Any tips appreciated.
 
T

toe

Just as an aside, some of you may remember that I posted recently
looking for a fully-portable implementation of the SHA-1 algorithm. I
had some code which was supposedly fully-portable, but when I ran it
on a Sun Solaris machine it gave me the wrong answer. It didn't crash
or anything, it just gave me a wrong answer. The reason it was wrong
is that the code assumed the machine to be little-endian (which is
what Intel x86 machines are -- and yes by the way I did just Google
that 60 seconds ago), whereas the Sun machines are big-endian.
 
C

CBFalconer

Just as an aside, some of you may remember that I posted recently
looking for a fully-portable implementation of the SHA-1 algorithm. I
had some code which was supposedly fully-portable, but when I ran it
on a Sun Solaris machine it gave me the wrong answer. It didn't crash
or anything, it just gave me a wrong answer. The reason it was wrong
is that the code assumed the machine to be little-endian (which is
what Intel x86 machines are -- and yes by the way I did just Google
that 60 seconds ago), whereas the Sun machines are big-endian.

Then the implementation was NOT fully portable. Probably did some
unclean conversions between integers and bytes. Just a guess.
 
N

Nick Keighley

Assume we're working on a system where CHAR_BIT == 8.

possibly stick an assert in somewhere so people have it drawn to their
attention if this isn't so. many on this ng will tell you to write the
code so it doesn't make this assumption.

Let's say we have a raw byte buffer in memory:

char unsigned data[112];

Within this buffer is data that you got from your network card, an
ethernet frame to be exact. An ethernet frame is laid out as follows:

First 6 octets: Destination MAC address
Second 6 octets: Source MAC address
Next two octets: Protocol

In order to analyse the ethernet frame, I was thinking that maybe we
could make an exact-size struct as follows:

struct FrameHeader {
    uint8 dest[6],src[6];
    uint16 proto;

};

(I realise that we'd need a special compiler that will allow us to
specify no padding between members. Also I realise we'd have to be
careful about alignment).

I tend not to be a fan of this technique. But in practice
if all the members are unsigned chars you should be ok.

And then do the following:

if ( 0x800 == ((struct FrameHeader const*)data)->proto )
puts("Contains an IP packet");

So far, I believe we have two issues:
1) The alignment of "proto"
2) The byte order of "proto"

Firstly, to get around the byte order issue, I was thinking of
changing the structure to:

struct FrameHeader {
    uint8 dest[6],src[6];
    uint8 proto[2];

}
better


And then making a macro function to turn a "uint8[2]" into a "uint16"
using BigEndian:

#define OCTETS_TO_16(p)    ( (uint16)*(p) << 8 | (p)[1] )

so that we could do:

if ( 0x800 == OCTETS_TO_16( ((struct FrameHeader const*)data)-
proto )  )puts("Contains an IP packet");

Does this sound good?


reasonable approach.


The program that's being written is a network protocol analyser. I
myself am not writing it, but I've been asked to give a little advice.
The program is being written for MS Windows, but since the person's
using a cross-platform library for networking, I think they might try
get it to compile for Linux and Mac aswell.

On these three OS's, is there any alignment requirements for integer
types, or will the program crash if we try to access a mis-aligned
integer?

probably. This tends to be a hardware rather than OS thing. And Linux
runs on a *lot* of hardware.

Also, is endianess determined by the CPU, or is determined by the OS?

the CPU. though some CPUs make it optional. Presumably the OS decides
then.
Does anyone know what the endianesses are for the common CPU's and
OS's?

Any tips appreciated.

you have a special case here. Comms protocols usually specify
the byte order. Then the implementation provides macros (hton() et
al)
to convert to and from platform and network (on-the-wire) byte order.
If network and platform (host) correspond the macros do nothing.
To port you just re-write the macros. Or you auto detect
the byte order then use the correct macro.
 
R

Richard Bos

Let's say we have a raw byte buffer in memory:

char unsigned data[112];

Within this buffer is data that you got from your network card, an
ethernet frame to be exact. An ethernet frame is laid out as follows:

First 6 octets: Destination MAC address
Second 6 octets: Source MAC address
Next two octets: Protocol

In order to analyse the ethernet frame, I was thinking that maybe we
could make an exact-size struct as follows:

Why go to all that trouble? One thing which is guaranteed to work, as
long as your layout is correct and chars are indeed 8 bits, is

#define PROTOCOL 12
#if (ENDIAN)
#define RAW_I16(x,y) (((int)x&0xff)<<8 + (y&0xff))
#else
#define RAW_I16(x,y) (((int)y&0xff)<<8 + (x&0xff))
#endif

if (RAW_I16(buffer[PROTOCOL], buffer[PROTOCOL+1]) == 0x0800)
puts("Contains an IP packet.");
 
C

christian.bau

Why go to all that trouble? One thing which is guaranteed to work, as
long as your layout is correct and chars are indeed 8 bits, is

  #define PROTOCOL 12
  #if (ENDIAN)
    #define RAW_I16(x,y) (((int)x&0xff)<<8 + (y&0xff))
  #else
    #define RAW_I16(x,y) (((int)y&0xff)<<8 + (x&0xff))
  #endif

  if (RAW_I16(buffer[PROTOCOL], buffer[PROTOCOL+1]) == 0x0800)
    puts("Contains an IP packet.");

This looks very wrong. I would expect that the buffer, as an array of
unsigned char, contains exactly the same data, whether it is running
on a bigendian, littleendian or some other machine. If an IP packet is
defined by byte 12 = 0x08, byte 13 = 0x00, then you would take the
first of your two definitions for RAW_I16, no matter what your
implementation looks like.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top