struct padding is slower than struct packing

diegotorquemada · May 3, 2013

Hi all,

As far as I understand, struct padding is implemented in order to speed up the execution of a program. I did this small example below in order to verify this issue. However, surprisingly, my program with struct padding is USUALLY slower than the one with struct packing. Any comments?

Kind regards,

Diego Andrés

#include <stdio.h>
#include <time.h>

#define N 10000000

struct struct_con_padding
{
char a;
int b;
char c;
} x[N];

struct __attribute__((__packed__)) struct_con_packing
{
char a;
int b;
char c;
} y[N];

int main(void)
{
int i;
clock_t tic, toc;

printf("tamaño(x) = %d Mb, tamaño(y) = %d Mb\n", sizeof(x)/1024/1024, sizeof(y)/1024/1024);

tic = clock();
for (i=0; i<N; i++) { x.a = 'Q'; x.b = 1000; x.c = 'R'; }
toc = clock();
printf("Tiempo con padding = %lf segundos\n", (double)(toc-tic)/CLOCKS_PER_SEC);

tic = clock();
for (i=0; i<N; i++) { y.a = 'Q'; y.b = 1000; y.c = 'R'; }
toc = clock();
printf("Tiempo con packing = %lf segundos\n", (double)(toc-tic)/CLOCKS_PER_SEC);

return 0;
}

-------------------------------------------------------------------------------

[daalvarez@earendil ~]$ gcc -Wall -o 07_struct_padding2 07_struct_padding2..c
[daalvarez@earendil ~]$ ./07_struct_padding2
tamaño(x) = 114 Mb, tamaño(y) = 57 Mb
Tiempo con padding = 0.470000 segundos
Tiempo con packing = 0.180000 segundos

[daalvarez@earendil ~]$ gcc --version
gcc (GCC) 4.5.2
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[daalvarez@earendil ~]$

Barry Schwarz · May 3, 2013

Did you turn optimizations off?

How many times did you run the test and were the results consistent?

How large is the cache on your system? How frequently did your
program experience cache misses?

How much real memory is available to your program. How much time was
spent paging data in and out?

What else was running on the system when you ran the test?

Hi all,

As far as I understand, struct padding is implemented in order to speed up the execution of a program. I did this small example below in order to verify this issue. However, surprisingly, my program with struct padding is USUALLY slower than the one with struct packing. Any comments?

Kind regards,

Diego Andrés

#include <stdio.h>
#include <time.h>

#define N 10000000

struct struct_con_padding
{
char a;
int b;
char c;
} x[N];

struct __attribute__((__packed__)) struct_con_packing
{
char a;
int b;
char c;
} y[N];

int main(void)
{
int i;
clock_t tic, toc;

printf("tamaño(x) = %d Mb, tamaño(y) = %d Mb\n", sizeof(x)/1024/1024, sizeof(y)/1024/1024);

tic = clock();
for (i=0; i<N; i++) { x.a = 'Q'; x.b = 1000; x.c = 'R'; }
toc = clock();
printf("Tiempo con padding = %lf segundos\n", (double)(toc-tic)/CLOCKS_PER_SEC);

tic = clock();
for (i=0; i<N; i++) { y.a = 'Q'; y.b = 1000; y.c = 'R'; }
toc = clock();
printf("Tiempo con packing = %lf segundos\n", (double)(toc-tic)/CLOCKS_PER_SEC);

return 0;
}

-------------------------------------------------------------------------------

[daalvarez@earendil ~]$ gcc -Wall -o 07_struct_padding2 07_struct_padding2.c
[daalvarez@earendil ~]$ ./07_struct_padding2
tamaño(x) = 114 Mb, tamaño(y) = 57 Mb
Tiempo con padding = 0.470000 segundos
Tiempo con packing = 0.180000 segundos

[daalvarez@earendil ~]$ gcc --version
gcc (GCC) 4.5.2
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[daalvarez@earendil ~]$

glen herrmannsfeldt · May 3, 2013

As far as I understand, struct padding is implemented in
order to speed up the execution of a program.
I did this small example below in order to verify this issue.
However, surprisingly, my program with struct padding is
USUALLY slower than the one with struct packing.
Any comments?

On many processors, the aligned version is infinitely faster
than the unaligned version.

Otherwise, it depends on many factors. It is possible that keeping
more in cache will more than make up for the different access times.

-- glen

Alan Curry · May 3, 2013

[...]

struct struct_con_padding
{
char a;
int b;
char c;
} x[N];

struct __attribute__((__packed__)) struct_con_packing
{
char a;
int b;
char c;
} y[N];
[...]

tic = clock();
for (i=0; i<N; i++) { x.a = 'Q'; x.b = 1000; x.c = 'R'; }
toc = clock();
printf("Tiempo con padding = %lf segundos\n", (double)(toc-tic)/CLOCKS_PER_SEC);

tic = clock();
for (i=0; i<N; i++) { y.a = 'Q'; y.b = 1000; y.c = 'R'; }
toc = clock();
printf("Tiempo con packing = %lf segundos\n", (double)(toc-tic)/CLOCKS_PER_SEC);

Click to expand...

[...]

Part of the problem is that a simple store into the array has turned
this into a simple bandwidth test, and the shorter structure simply
wins. If the code were to access b from memory many times the aligned
version would likely win.

Also since the timing loops are storing into previously untouched memory,
a large part of what's being measured is how fast the kernel's page fault
handler can generate pages full of zeros.

James Kuyper · May 3, 2013

On 05/03/2013 04:19 AM, christian.bau wrote:
....

To avoid confusion: "Mb" stands for megabit, "MB" stands for megabyte.
One megabit = 1,000,000 bits, not 1024 x 1024 bits. One megabyte =
1,000,000 bytes, not 1024 x 1024 bytes. By using the proper units, you
would have found that 10 million elements are exactly 120 vs 60 MB. 12
vs. 6 MB per million elements.

Apparently you're right - according to Wikipedia: "In 1998 the
International Electrotechnical Commission (IEC) proposed standards for
binary prefixes and requiring the use of megabyte to strictly denote
10002 bytes and mebibyte to denote 10242 bytes. By the end of 2007, the
IEC Standard had been adopted by the IEEE, EU, and NIST. Nevertheless,
the term megabyte continues to be widely used with different meanings:"

Note the last sentence - use of MB to mean 2^20 Bytes is still quite
common. I had heard of disk makers using 10^6 byte "megabytes" to
misleadingly inflate their disk capacities, but I had never before heard
of this standard, or "mebibytes", until I did some research while
preparing to write this response.

Keith Thompson · May 3, 2013

James Kuyper said:
On 05/03/2013 04:19 AM, christian.bau wrote:
...

Apparently you're right - according to Wikipedia: "In 1998 the
International Electrotechnical Commission (IEC) proposed standards for
binary prefixes and requiring the use of megabyte to strictly denote
10002 bytes and mebibyte to denote 10242 bytes. By the end of 2007, the
IEC Standard had been adopted by the IEEE, EU, and NIST. Nevertheless,
the term megabyte continues to be widely used with different meanings:"

It's a pity superscripts don't copy and paste well.

Replace 10002 by 1000^2, and 10242 by 1024^2.

[...]

Ken Brody · May 3, 2013

On 5/2/2013 7:17 PM, (e-mail address removed) wrote:
[...]

#define N 10000000 [...]
struct __attribute__((__packed__)) struct_con_packing
{
char a;
int b;
char c;
} y[N]; [...]
for (i=0; i<N; i++) { y.a = 'Q'; y.b = 1000; y.c = 'R'; }

[...]

Question:

Assume that the above is compiled on a 32-bit little-endian computer, such
as an Intel x86 CPU. The 6 bytes in y would end up containing the
following hex values:

51 E8 03 00 00 52

Following the "as-if" rule, couldn't the compiler optimize the loop to:

Store 32-bit value 0x0003E851 at address X
Store 16-bit value 0x5200 at address X+4

I'd be curious as to the actual code generated by the OP's compiler.
However, there are numerous possible explanations why this particular piece
of code ran faster than the padded struct version, as the other replies have
demonstrated.

On the other hand, non-optimized code on a CPU such as the RS/6000, which
forbids unaligned accesses, would run orders of magnitude slower. (Or just
crash.) The last time I used such a system, the O/S was set up to trap the
fault and emulate the access, using aligned reads/writes and bit shifting.

88888 Dihedral · May 3, 2013

(e-mail address removed)æ–¼ 2013å¹´5æœˆ3æ—¥æ˜ŸæœŸäº”UTC+8ä¸Šåˆ7æ™‚17åˆ†02ç§’å¯«é“ï¼š

Hi all,

As far as I understand, struct padding is implemented in order to speed up the execution of a program. I did this small example below in order to verify this issue. However, surprisingly, my program with struct padding is USUALLY slower than the one with struct packing. Any comments?

Kind regards,

Diego AndrÃ©s

#include <stdio.h>

#include <time.h>

#define N 10000000

Please test N in scales of ten from 100 to 1000000000L. // 32 bit

struct struct_con_padding

{

char a;

int b;

char c;

} x[N];

struct __attribute__((__packed__)) struct_con_packing

{

char a;

int b;

char c;

} y[N];

int main(void)

{

int i;

clock_t tic, toc;

printf("tamaÃ±o(x) = %d Mb, tamaÃ±o(y) = %d Mb\n", sizeof(x)/1024/1024, sizeof(y)/1024/1024);

tic = clock();

for (i=0; i<N; i++) { x.a = 'Q'; x.b = 1000; x.c = 'R'; }

toc = clock();

printf("Tiempo con padding = %lf segundos\n", (double)(toc-tic)/CLOCKS_PER_SEC);

tic = clock();

for (i=0; i<N; i++) { y.a = 'Q'; y.b = 1000; y.c = 'R'; }

toc = clock();

printf("Tiempo con packing = %lf segundos\n", (double)(toc-tic)/CLOCKS_PER_SEC);

return 0;

}

-------------------------------------------------------------------------------

[daalvarez@earendil ~]$ gcc -Wall -o 07_struct_padding2 07_struct_padding2.c

[daalvarez@earendil ~]$ ./07_struct_padding2

tamaÃ±o(x) = 114 Mb, tamaÃ±o(y) = 57 Mb

Tiempo con padding = 0.470000 segundos

Tiempo con packing = 0.180000 segundos

[daalvarez@earendil ~]$ gcc --version

gcc (GCC) 4.5.2

Copyright (C) 2010 Free Software Foundation, Inc.

This is free software; see the source for copying conditions. There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[daalvarez@earendil ~]$

Edward A. Falk · May 3, 2013

Question:

Assume that the above is compiled on a 32-bit little-endian computer, such
as an Intel x86 CPU. The 6 bytes in y would end up containing the
following hex values:

51 E8 03 00 00 52

Following the "as-if" rule, couldn't the compiler optimize the loop to:

Store 32-bit value 0x0003E851 at address X
Store 16-bit value 0x5200 at address X+4

I'd be curious as to the actual code generated by the OP's compiler.

Indeed. Any chance of seeing the assembly code generated for this?

glen herrmannsfeldt · May 3, 2013

(snip)

Disk manufacturers have used the "correct" (10**3) meaning for the SI
prefixes mostly forever (IBM's marketing literature for the 3380
mentioned that a string could hold more than "10 billion" bytes,
although they didn't use "GB" in the original literature, they did
later:
http://www-03.ibm.com/ibm/history/exhibits/storage/storage_3380c.html).
Some floppy disk formats, and CD-ROMs (but not DVDs, Blue-Ray, and
other optical media) are the exceptions. It became something of an
issue when consumer disks started getting bigger than a few tens of
megabytes started showing a meaning difference from the "expected"
size.

IBM disk drives tend not to use power of two sizes. The track size on
the 3330 is 13030 bytes (one block per track).

13030*404*19=100018280 bytes per pack. Note that 13030, 404, and 19
are all not powers of two. (The 19 track/cylinder is because of the
rotational position sensing information on the 20th track.)

-- glen

glen herrmannsfeldt · May 4, 2013

(snip)

One thing that annoys me with x86 compilers is that they compile 80
bit long double to have a size of 16 bytes instead of 10, so one
megabyte cannot hold 100,000 long doubles but only 62,500. On a modern
processor, the non-aligned access makes no difference (as far as I can
measure; but the smaller array sizes reduce bandwidth needs.

Some years ago there were complaints about slow access to double on
odd four byte boundaries. The 80486 was a 32 bit data bus, and so didn't
notice. The MS compilers, and maybe others, put allocated to four byte
boundaries. The the pentium and successors came along with 64 bit data
bus, and things slowed down.

I don't know what the natural alignment is for temporary real.

-- glen

Geoff · May 5, 2013

True, although FBA-style disks don't really either, except that
whatever the size is in some multiple of 512 or 4096, which is almost
never (ever?) a power of two.

But the point is that I don't think anyone has ever actually
documented any hard drive being sold in binary units. 10MB was always
10 million byte, often plus a smidge. After many iterations (and
years) on the Wikipedia article, the statement about hard drives was
changed to unequivocal after absolutely no evidence to the contrary
was found. If any such usage actually existed (and it could well), it
was extremely rare.

Actually, what was happening was precisely that they _weren't_ using
powers of two but neglecting to specify that fact.

Memory was sold in units of powers-of-two bytes, as in 1MB = 1,048,576
bytes and it was called one megabyte. Drive manufacturers were taking
advantage of this and rating their drives in 1MB = 1,000,000 bytes,
giving a 10 MB hard drive specification that inflated the expectation
in the customer's mind by implying that it was that much bigger than
it really was. The larger the disk, the greater the discrepancy. They
were using the MB term but not telling you that it didn't stand for
the same MB that memory manufacturers (and everyone else) were using.

Then there's the old formatted vs unformatted specification
controversy.

Then Microsoft comes along and counts bytes in one interface and
counts MB and GB in another interface where 2,446,950,400 bytes
becomes 2.27GB in one display and 2,389,600 kB in another.

Geoff · May 5, 2013

I've never really bought that argument. There is one unambiguously
correct definition for M (1,000,000), and it is used in a vast number
of places. Even within computing, M=1,000,000 is used in more
distinct places than M=1,048,576, although there are a
disproportionate number of instances of the latter uses. Other than
memory, M=1,048,576, is basically is *not* used (there are of course a
few exceptions).

Well, I can't make you buy it. It's ancient history to me now but I
remember many discussions about it in the early or mid-70's.

My electronics engineering background has taught me M = mega =
1,000,000 and I never expected a 2 MegOhm resistor to be 2^20 Ohms but
computer geeks were calling _mega_bytes_ by powers of two and those of
us who knew better had to live with it.

IEEE eventually had to step in and make up abbreviations for powers of
ten vs. powers of two.

http://en.wikipedia.org/wiki/Megabyte

Suffice to say there exist two definitions of mega, 10^6 for the
scientists, engineers and public and 2^20 for the hardware and
software geeks.

glen herrmannsfeldt · May 5, 2013

(snip)

I've never really bought that argument. There is one unambiguously
correct definition for M (1,000,000), and it is used in a vast number
of places. Even within computing, M=1,000,000 is used in more
distinct places than M=1,048,576, although there are a
disproportionate number of instances of the latter uses. Other than
memory, M=1,048,576, is basically is *not* used (there are of course a
few exceptions). Did Intel mislead people when the sold a 100MHz
processor? Were people really expecting it to run at 104,857,600Hz?
Or expecting their 19.2kb modems to run 19,660 baud? Or the
computer's bus..., or disk transfer rate..., or the LAN..., or...?

This is perhaps an Americanism. Where else* could any non-specialists
think it's reasonable that M is *not* 1,000,000?

In the S/370 days, I used to wonder if IBM prices processors in $K,
if that was $1000 or $1024.

Well, frequencies were in kHz and MHz before the question of computer
clock speed came up.

In the early years, there were computers with decimal addressing, and so
powers of 10 (or small multiples of powers of 10) capacity.

-- glen

nested struct	0	Jul 12, 2011
fork() 50 times slower on some Linux kernels?	14	Jan 9, 2012
Java is slower than C++!	46	Jun 9, 2007
Bitfields in struct	2	Oct 11, 2005
struct point not identified by gcc	39	Feb 22, 2007
avoid structure padding	2	May 11, 2006
Is there in memory tree structure faster than std::map?	11	Aug 12, 2012
union, strcpy and main()	10	Nov 29, 2011

struct padding is slower than struct packing

diegotorquemada

Barry Schwarz

glen herrmannsfeldt

Alan Curry

James Kuyper

Keith Thompson

Ken Brody

88888 Dihedral

Edward A. Falk

glen herrmannsfeldt

glen herrmannsfeldt

Geoff

Geoff

glen herrmannsfeldt

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads