writing uint16_t in a buffer

  • Thread starter Alessio Sangalli
  • Start date
A

Alessio Sangalli

Hi, I am building up a buffer in memory to be sent over the network as a
data structure.

In the following example, I will omit all the hton* calls for simplicity.

Imagine I have a char buffer[16] and I have to fill it up with a number
of small, 16bit values.

What I did is to define a macro:
#define CPY16(d,s) *(uint16_t*)&d=s

and then use it as follows:
uint16_t a=0xfaaf;
uint16_t b=0xc33c;

CPY16(buffer[12], a);
CPY16(buffer[2], b);

I did this because profiling revealed that the version above is almost
10 times faster than memcpy().

Where's the catch? Am I implementation/compiler/architecture dependent?

bye, thank you
Alessio
 
B

Bartc

Alessio Sangalli said:
Hi, I am building up a buffer in memory to be sent over the network as a
data structure.

In the following example, I will omit all the hton* calls for simplicity.

Imagine I have a char buffer[16] and I have to fill it up with a number
of small, 16bit values.

What I did is to define a macro:
#define CPY16(d,s) *(uint16_t*)&d=s

and then use it as follows:
uint16_t a=0xfaaf;
uint16_t b=0xc33c;

CPY16(buffer[12], a);
CPY16(buffer[2], b);

I did this because profiling revealed that the version above is almost
10 times faster than memcpy().

Are the uint16_t's always written to an even index? In that case why not
just use an array of 8 uint16_t's?

Or, if the values written will not overlap each, perhaps a struct.
Where's the catch? Am I implementation/compiler/architecture dependent?

There might be issues with alignment (and possibly byte-order).
 
W

Willem

Alessio Sangalli wrote:
) Hi, I am building up a buffer in memory to be sent over the network as a
) data structure.
)
) In the following example, I will omit all the hton* calls for simplicity.
)
) Imagine I have a char buffer[16] and I have to fill it up with a number
) of small, 16bit values.
)
) What I did is to define a macro:
) #define CPY16(d,s) *(uint16_t*)&d=s
)
) and then use it as follows:
) uint16_t a=0xfaaf;
) uint16_t b=0xc33c;
)
) CPY16(buffer[12], a);
) CPY16(buffer[2], b);
)
) I did this because profiling revealed that the version above is almost
) 10 times faster than memcpy().

Are you using memcpy() once on the entire buffer, or are you
calling it for 2 bytes ?

How much faster is it than:

buffer[12] = 0xfa;
buffer[13] = 0xaf;
buffer[2] = 0xc3;
buffer[3] = 0x3c;


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
L

litchie

Alessio Sangalli wrote:

) Hi, I am building up a buffer in memory to be sent over the network as a
) data structure.
)
) In the following example, I will omit all the hton* calls for simplicity.
)
) Imagine I have a char buffer[16] and I have to fill it up with a number
) of small, 16bit values.
)
) What I did is to define a macro:
) #define CPY16(d,s) *(uint16_t*)&d=s
)
) and then use it as follows:
) uint16_t a=0xfaaf;
) uint16_t b=0xc33c;
)
) CPY16(buffer[12], a);
) CPY16(buffer[2], b);
)
) I did this because profiling revealed that the version above is almost
) 10 times faster than memcpy().

Are you using memcpy() once on the entire buffer, or are you
calling it for 2 bytes ?

How much faster is it than:

buffer[12] = 0xfa;
buffer[13] = 0xaf;
buffer[2] = 0xc3;
buffer[3] = 0x3c;

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Alessio's solution takes 2 instructions to complete with a little risk
on unalign exceptions, yours takes 4 instructions but can run safely
on any platform.

Using memcpy() takes even more.. (function call + loop control + ...)

However, this kind of raw conversion is not recommended unless your
program will only run on same endian type machines.
 
T

Thad Smith

Alessio said:
Hi, I am building up a buffer in memory to be sent over the network as a
data structure.

In the following example, I will omit all the hton* calls for simplicity.

Imagine I have a char buffer[16] and I have to fill it up with a number
of small, 16bit values.

What I did is to define a macro:
#define CPY16(d,s) *(uint16_t*)&d=s

and then use it as follows:
uint16_t a=0xfaaf;
uint16_t b=0xc33c;

CPY16(buffer[12], a);
CPY16(buffer[2], b);

Do you want to encode the data as most significant byte first (big
endian) or least significant byte first (little endian)? This should
always be specified if you are transmitting or storing the data external
to the program.

I don't have a compiler handy to check, but you can use something like

#define INTTOBE16(d,s) ((&(d))[0]=((s)>>8)&0xff, (&(d))[1]=(s)&0xff)
#define INTTOLE16(d,s) ((&(d))[0]=(s)&0xff, (&(d))[1]=((s)>>8)&0xff)

Since both d and s are used twice in the macros, the arguments, when the
macro is used, should be free of side effects, such as incrementing a
value or calling a function.
 
F

Flash Gordon

litchie wrote, On 03/12/08 04:34:
Alessio Sangalli wrote:

) Hi, I am building up a buffer in memory to be sent over the network as a
) data structure.
)
) In the following example, I will omit all the hton* calls for simplicity.
)
) Imagine I have a char buffer[16] and I have to fill it up with a number
) of small, 16bit values.
)
) What I did is to define a macro:
) #define CPY16(d,s) *(uint16_t*)&d=s
)
) and then use it as follows:
) uint16_t a=0xfaaf;
) uint16_t b=0xc33c;
)
) CPY16(buffer[12], a);
) CPY16(buffer[2], b);
)
) I did this because profiling revealed that the version above is almost
) 10 times faster than memcpy().

Are you using memcpy() once on the entire buffer, or are you
calling it for 2 bytes ?

How much faster is it than:

buffer[12] = 0xfa;
buffer[13] = 0xaf;
buffer[2] = 0xc3;
buffer[3] = 0x3c;

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Please don't quote peoples signatures, the bit typically after the "--
", although in this case it started with "SaSW, Willem".
Alessio's solution takes 2 instructions to complete with a little risk
on unalign exceptions,

There are platforms on which you won't get an exception but it can be
significantly slowed if the data is incorrectly aligned. I believe there
are also real current systems where it will trap.
yours takes 4 instructions but can run safely
on any platform.

How many instructions it takes depends. In part it depends on whether
you use the optimiser for the compiler.
Using memcpy() takes even more.. (function call + loop control + ...)

In that case switch on the optimiser. Decent compilers will inline the
memcpy and then optimise the inlined code. If you are not using the
optimiser then before even looking at how the code is written you should
try it!
However, this kind of raw conversion is not recommended unless your
program will only run on same endian type machines.

The OP mentioned that for simplicity the calls to the funtion dealing
with endianness had been omitted.
 
L

litchie

Flash said:
litchie wrote, On 03/12/08 04:34: [...]
Alessio's solution takes 2 instructions to complete with a little risk
on unalign exceptions,
There are platforms on which you won't get an exception but it can be
significantly slowed if the data is incorrectly aligned. I believe there
are also real current systems where it will trap.

[...]

Worse, on many ARM processor types (widely used in portable devices these
days) I'm pretty sure unaligned word accesses simply get the "wrong"
bytes. So you won't even break into a debugger.

ARM arch spec says that unaligned word access must raise exception.
 
C

Chris Dollin

litchie said:
Flash said:
litchie wrote, On 03/12/08 04:34: [...]
Alessio's solution takes 2 instructions to complete with a little risk
on unalign exceptions,
There are platforms on which you won't get an exception but it can be
significantly slowed if the data is incorrectly aligned. I believe there
are also real current systems where it will trap.

[...]

Worse, on many ARM processor types (widely used in portable devices these
days) I'm pretty sure unaligned word accesses simply get the "wrong"
bytes. So you won't even break into a debugger.

ARM arch spec says that unaligned word access must raise exception.

The ARM in my RISC PC at home doesn't conform to that spec, then.
(It's an old StrongARM, so not really relevant to what you'd encounter
in recent embedded ARMS -- but if you did try the misaligned copy trick,
you /would/ just get the "wrong" answer, if I'm remembering correctly.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,262
Messages
2,571,048
Members
48,769
Latest member
Clifft

Latest Threads

Top