Alignment issues -- are they an issue?

  • Thread starter Tomás Ó hÉilidhe
  • Start date
T

Tomás Ó hÉilidhe

I'm doing low-level networking programming at the moment writing my
own Ethernet frames, so I start off with the Destination MAC address,
then Source MAC address, then Protocol ID, then I have the IP packet,
then the UDP segment, and so forth.

The networking library I'm using is called Berkeley Sockets; I've
decided to go with it because I hear it's the most ubiquitous
networking library across all platforms.

Anyway, although I want my program to be as portable as possible, I
realise that it will only be portable to systems which have an
implemenation of Berkeley Sockets, and also which have an exact 8-Bit
type, a 16-Bit type and a 32-Bit type (all without padding). I get
these types from stdint.h:

#include <stdint.h>

int VerifyPacketChecksum(uint8_t const *packet);

Throughout my code though, there are a few instances in which I deal
with taking 16-Bit numbers from an Ethernet frame. I know that one
possible method of doing this would be:

(p[0] << 8) | p[1]

But at the moment I have the following in my code:

ntohs( *(uint16_t const*)p )

("ntohs" is a function which converts from network byte order to host
byte order)

It's possible that "p" will not be aligned on a two-byte boundary, but
I'm wondering if I'll have a problem? I realise that the C Standard
says outright that the behaviour is undefined if alignment
requirements are not met... but seeing as how I've already made
assumptions about there being an 8-Bit, 16-Bit and 32-Bit type, would
it not be also fair to assume that I can access a uint16_t regardless
of how it's aligned?

I suppose in essence what I'm asking is as follows: On the systems
where Berkeley Sockets is implemented, and where there are exact 8-
Bit, 16-Bit and 32-Bit types, is it OK to read or write a uint16_t
from memory regardless of the alignment? The main platforms I have in
mind are Windows, Linux, Mac, Unix, Solaris, and also possible XBox360
and Playstation 3.

Or should I just go with (p[0] << 8) | p[1] to be safe?
 
N

Nick Keighley

I'm doing low-level networking programming at the moment writing my
own Ethernet frames, so I start off with the Destination MAC address,
then Source MAC address, then Protocol ID, then I have the IP packet,
then the UDP segment, and so forth.

TCP/IP (I'm including UDP in that) headers are arranged so
that on most sane architectures you don't have alignment issues.

I think they try to align on 32-byte boundaries. Tricky for 9-bit
bytes and 36-bit words but there you are. TCP/IP pretty well assumes
you can generate (and receive) a stream of octets (8-bit bytes)
somehow. It specifies what it's going to look like on-the-wire.
How you represent it in your program is your problem (or in the
real world, the socket library's problem).

The networking library I'm using is called Berkeley Sockets; I've
decided to go with it because I hear it's the most ubiquitous
networking library across all platforms

yes, probably

Anyway, although I want my program to be as portable as possible, I
realise that it will only be portable to systems which have an
implemenation of Berkeley Sockets, and also which have an exact 8-Bit
type, a 16-Bit type and a 32-Bit type (all without padding). I get
these types from stdint.h:

I'm not sure that's necessary. Doesn't the socket library
provide the appropriate struct declarations?

If you don't have a Berkely Socket implementaion then you
have a little more to worry about than alignment. Like writing
the whole UDP stack...

    #include <stdint.h>

if this doesn't exist you can (usually) hack one together
    int VerifyPacketChecksum(uint8_t const *packet);

Throughout my code though, there are a few instances in which I deal
with taking 16-Bit numbers from an Ethernet frame.

you *are* writing your own UDP stack?

I know that one
possible method of doing this would be:

    (p[0] << 8) | p[1]

But at the moment I have the following in my code:

    ntohs(    *(uint16_t const*)p    )

("ntohs" is a function which converts from network byte order to host
byte order)

It's possible that "p" will not be aligned on a two-byte boundary, but
I'm wondering if I'll have a problem?

I have seen code blow up on this. A well written UDP implementation
should be ok with this. This is why I'm wondering if you are the
implementor.
I realise that the C Standard
says outright that the behaviour is undefined if alignment
requirements are not met... but seeing as how I've already made
assumptions about there being an 8-Bit, 16-Bit and 32-Bit type,

which I think is a bad idea
would
it not be also fair to assume that I can access a uint16_t regardless
of how it's aligned?

no. Really no.

I suppose in essence what I'm asking is as follows: On the systems
where Berkeley Sockets is implemented, and where there are exact 8-
Bit, 16-Bit and 32-Bit types, is it OK to read or write a uint16_t
from memory regardless of the alignment?

the old (ok, stoneage) 68000 used to do this. It had 16 and 32 bit
types
(well the compiler did) but they had to be aligned, or it trapped. I
suspect some modern RISC chips still do this.

The main platforms I have in
mind are Windows, Linux, Mac, Unix, Solaris, and also possible XBox360
and Playstation 3.

Or should I just go with (p[0] << 8) | p[1] to be safe?

I go with shift and or. It isn't much more code!

After we had the alignment trap we put this code in

p[0] << 8 + p[1]

which was "interesting"


--
Nick Keighley

"I don't skydive; I don't bungee;
I don't go on rollercoasters, they scare me to death."
Col. Eileen Collins (Shuttle Pilot)
 
T

Tomás Ó hÉilidhe

TCP/IP (I'm including UDP in that) headers are arranged so
that on most sane architectures you don't have alignment issues.


Only problem is that then entire frame in memory might have strange
alignment.

you *are* writing your own UDP stack?

Yup!


I have seen code blow up on this. A well written UDP implementation
should be ok with this. This is why I'm wondering if you are the
implementor.


I'm writing a program at the moment which maps a network, telling you
all the hosts present, and also telling you which routers (if any)
lead to the internet. I test for an internet connection by sending DNS
request packets to the MAC address of each of the hosts and seeing if
I get a reply. I hand-craft these DNS requests by myself, sending them
to different MAC addresses.

I go with shift and or. It isn't much more code!

After we had the alignment trap we put this code in

    p[0] << 8 + p[1]

which was "interesting"


I wonder if it'd be too "distancing" to do the following for now:

#define GET_S(p) (p[0] << 8 | p[1])

But of course then I've the issue of multiple evaluation of the macro
argument.
 
R

robertwessel2

It's possible that "p" will not be aligned on a two-byte boundary, but
I'm wondering if I'll have a problem? I realise that the C Standard
says outright that the behaviour is undefined if alignment
requirements are not met... but seeing as how I've already made
assumptions about there being an 8-Bit, 16-Bit and 32-Bit type, would
it not be also fair to assume that I can access a uint16_t regardless
of how it's aligned?

I suppose in essence what I'm asking is as follows: On the systems
where Berkeley Sockets is implemented, and where there are exact 8-
Bit, 16-Bit and 32-Bit types, is it OK to read or write a uint16_t
from memory regardless of the alignment? The main platforms I have in
mind are Windows, Linux, Mac, Unix, Solaris, and also possible XBox360
and Playstation 3.

Or should I just go with (p[0] << 8) | p[1] to be safe?


There are absolutely systems where an unaligned access will, or can,
fault. PPC (or POWER) and IPF (Itanium) are both examples. In both
of those cases, there are *some* unaligned access than can succeed (in
the case of PPC, so long as you don't cross a page boundary*, or so
long as you don't cross a cache line on IPF). Other processors have
had no direct unaligned support at all (Alpha, for example).

And all three of those systems happen to support 8, 16 and 32 bit
types.

For CPUs that don’t support unaligned accesses, or restrict them,
there are often some instructions meant for faking it by doing two
accesses and then pasting the result together. And of course you can
always fake it with a series of accesses, shifts, and whatnot. You
can sometimes convince a C compiler, with a implementation specific
extension telling it that an item is not aligned, to generate that
sequence.

OTOH, many OS's catch the unaligned access traps, and fake it for you,
but that's at a severe performance penalty - usually on the order of
100 times what a non-trapping access costs you. You can take
advantage of that if unaligned accesses are rare, especially on
systems like PPC, where many unaligned access are, in fact handled
quickly by the hardware, but you do *not* want it to happen often.

Of course in at least two OS's I know of, the fixup is *not* available
to kernel code, so if you're writing something that lives there (which
you might well be doing, since you're writing a TCP/IP stack - OTOH,
you're using Sockets, so you might be doing a user level stack), that
may not be an option.

I would advise you to use the shift/or approach. You can always
optimize later. Perhaps you can wrap it in a macro or function that
you can easily change. And a nicely named macro or function is
probably clearer than scattering those shift/or sequences all over
your code. And before you go optimizing this, see what the compiler
is actually generating - in many cases, if you're on a big-endian CPU,
and unaligned accesses work naturally, the compiler will optimize that
down to a single instruction anyway.


*There are some cases where you can cross a page boundary with an
unaligned access on PPC assuming several serious restrictions are met,
and depending on the implementation)
 
K

Keith Thompson

There are absolutely systems where an unaligned access will, or can,
fault. PPC (or POWER) and IPF (Itanium) are both examples. In both
of those cases, there are *some* unaligned access than can succeed (in
the case of PPC, so long as you don't cross a page boundary*, or so
long as you don't cross a cache line on IPF). Other processors have
had no direct unaligned support at all (Alpha, for example).
[...]

I've seen at least one system (I think it was the old RS/6000) where
an attempted unaligned access appears to succeed, but the low-order
bit of the address is silently ignored.
 
F

Flash Gordon

Keith Thompson wrote, On 22/09/08 18:01:
There are absolutely systems where an unaligned access will, or can,
fault. PPC (or POWER) and IPF (Itanium) are both examples. In both
of those cases, there are *some* unaligned access than can succeed (in
the case of PPC, so long as you don't cross a page boundary*, or so
long as you don't cross a cache line on IPF). Other processors have
had no direct unaligned support at all (Alpha, for example).
[...]

I've seen at least one system (I think it was the old RS/6000) where
an attempted unaligned access appears to succeed, but the low-order
bit of the address is silently ignored.

I had an old DR/6000 until recently, and my company still has 4 of them
(one is live with a 233MHz processor, the other 3 will be a backup at my
office and DR and backup at a satellite office). All of them running AIX
4.3 and SW which is still under active maintenance. So such machines are
not dead yet!
 
T

Tim Prince

Keith said:
There are absolutely systems where an unaligned access will, or can,
fault. PPC (or POWER) and IPF (Itanium) are both examples. In both
of those cases, there are *some* unaligned access than can succeed (in
the case of PPC, so long as you don't cross a page boundary*, or so
long as you don't cross a cache line on IPF). Other processors have
had no direct unaligned support at all (Alpha, for example).
[...]

I've seen at least one system (I think it was the old RS/6000) where
an attempted unaligned access appears to succeed, but the low-order
bit of the address is silently ignored.
Before that, the GeCOS systems ignored enough low order bits to make an
effectively aligned address, and provided the data from that address.
As to the Itanium, possibly depending on the OS, there is optional
support for trapping to a function which fixes up a mis-aligned access,
but the performance penalty is prohibitive (even when no mis-alignment
occurs).
And then, since the introduction of SSE instructions about 10 years ago,
for "mov" operations, there are both aligned instructions, which fault
on mis-alignment, and unaligned instructions, which support various
degrees of mis-alignment, with varying degrees of performance penalties.
 
I

Ian Collins

Tomás Ó hÉilidhe said:
I wonder if it'd be too "distancing" to do the following for now:

#define GET_S(p) (p[0] << 8 | p[1])

But of course then I've the issue of multiple evaluation of the macro
argument.

Use a function.
 
B

Barry Schwarz

I'm doing low-level networking programming at the moment writing my
own Ethernet frames, so I start off with the Destination MAC address,
then Source MAC address, then Protocol ID, then I have the IP packet,
then the UDP segment, and so forth.

The networking library I'm using is called Berkeley Sockets; I've
decided to go with it because I hear it's the most ubiquitous
networking library across all platforms.

Anyway, although I want my program to be as portable as possible, I
realise that it will only be portable to systems which have an
implemenation of Berkeley Sockets, and also which have an exact 8-Bit
type, a 16-Bit type and a 32-Bit type (all without padding). I get
these types from stdint.h:

#include <stdint.h>

int VerifyPacketChecksum(uint8_t const *packet);

Throughout my code though, there are a few instances in which I deal
with taking 16-Bit numbers from an Ethernet frame. I know that one
possible method of doing this would be:

(p[0] << 8) | p[1]

But at the moment I have the following in my code:

ntohs( *(uint16_t const*)p )

("ntohs" is a function which converts from network byte order to host
byte order)

It's possible that "p" will not be aligned on a two-byte boundary, but
I'm wondering if I'll have a problem? I realise that the C Standard
says outright that the behaviour is undefined if alignment
requirements are not met... but seeing as how I've already made
assumptions about there being an 8-Bit, 16-Bit and 32-Bit type, would
it not be also fair to assume that I can access a uint16_t regardless
of how it's aligned?

I suppose in essence what I'm asking is as follows: On the systems
where Berkeley Sockets is implemented, and where there are exact 8-
Bit, 16-Bit and 32-Bit types, is it OK to read or write a uint16_t
from memory regardless of the alignment? The main platforms I have in
mind are Windows, Linux, Mac, Unix, Solaris, and also possible XBox360
and Playstation 3.

Or should I just go with (p[0] << 8) | p[1] to be safe?

Even with all your assumptions, to which I add that your buffer is
aligned on a multiple of four, the question is - will any 2 byte field
in any message you need to process for which you would like to use
ntohs ever be located on an odd boundary? If you can't guarantee that
the answer will always for ever be "no", you have a problem just
waiting to happen.
 
C

Chris Dollin

Flash said:
Keith Thompson wrote, On 22/09/08 18:01:
There are absolutely systems where an unaligned access will, or can,
fault. PPC (or POWER) and IPF (Itanium) are both examples. In both
of those cases, there are *some* unaligned access than can succeed (in
the case of PPC, so long as you don't cross a page boundary*, or so
long as you don't cross a cache line on IPF). Other processors have
had no direct unaligned support at all (Alpha, for example).
[...]

I've seen at least one system (I think it was the old RS/6000) where
an attempted unaligned access appears to succeed, but the low-order
bit of the address is silently ignored.

I had an old DR/6000 until recently, and my company still has 4 of them
(one is live with a 233MHz processor, the other 3 will be a backup at my
office and DR and backup at a satellite office). All of them running AIX
4.3 and SW which is still under active maintenance. So such machines are
not dead yet!

The Archimedes and RISC PCs had (or have, or some had or have) hardware
so that unaligned access to a word at P loaded the whole word from
(P & ~3) [1] and then rotated it (P & 3) bytes round. I expect you
can work out why ...

[1] Because by this time the pointer really is an integer index into
(mapped) memory.
 
T

Tomás Ó hÉilidhe

Thanks everybody for your helpful replies. At the moment I'm using the
following:

uint16_t Get16(uint8_t const *const p)
{
return ((uint16_t)(p[0]) << 8) | p[1];
}

void Set16(uint8_t *const p,uint16_t const val)
{
p[0] = val >> 8;
p[1] = val & 0xFF;
}

uint32_t Get32(uint8_t const *const p)
{
return ((uint32_t)(p[0]) << 24) | ((uint32_t)(p[1]) << 16) |
((uint32_t)(p[2]) << 8) | p[3];
}

void Set32(uint8_t *const p,uint32_t const val)
{
p[0] = val >> 24;
p[1] = val >> 16;
p[2] = val >> 8;
p[3] = val;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top