struct padding is slower than struct packing

Discussion in 'C Programming' started by diegotorquemada@gmail.com, May 3, 2013.

  1. Guest

    Hi all,

    As far as I understand, struct padding is implemented in order to speed up the execution of a program. I did this small example below in order to verify this issue. However, surprisingly, my program with struct padding is USUALLY slower than the one with struct packing. Any comments?

    Kind regards,

    Diego Andrés


    #include <stdio.h>
    #include <time.h>

    #define N 10000000

    struct struct_con_padding
    {
    char a;
    int b;
    char c;
    } x[N];


    struct __attribute__((__packed__)) struct_con_packing
    {
    char a;
    int b;
    char c;
    } y[N];


    int main(void)
    {
    int i;
    clock_t tic, toc;

    printf("tamaño(x) = %d Mb, tamaño(y) = %d Mb\n", sizeof(x)/1024/1024, sizeof(y)/1024/1024);

    tic = clock();
    for (i=0; i<N; i++) { x.a = 'Q'; x.b = 1000; x.c = 'R'; }
    toc = clock();
    printf("Tiempo con padding = %lf segundos\n", (double)(toc-tic)/CLOCKS_PER_SEC);

    tic = clock();
    for (i=0; i<N; i++) { y.a = 'Q'; y.b = 1000; y.c = 'R'; }
    toc = clock();
    printf("Tiempo con packing = %lf segundos\n", (double)(toc-tic)/CLOCKS_PER_SEC);

    return 0;
    }


    -------------------------------------------------------------------------------



    [daalvarez@earendil ~]$ gcc -Wall -o 07_struct_padding2 07_struct_padding2..c
    [daalvarez@earendil ~]$ ./07_struct_padding2
    tamaño(x) = 114 Mb, tamaño(y) = 57 Mb
    Tiempo con padding = 0.470000 segundos
    Tiempo con packing = 0.180000 segundos


    [daalvarez@earendil ~]$ gcc --version
    gcc (GCC) 4.5.2
    Copyright (C) 2010 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions. There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

    [daalvarez@earendil ~]$
    , May 3, 2013
    #1
    1. Advertising

  2. Did you turn optimizations off?

    How many times did you run the test and were the results consistent?

    How large is the cache on your system? How frequently did your
    program experience cache misses?

    How much real memory is available to your program. How much time was
    spent paging data in and out?

    What else was running on the system when you ran the test?

    On Thu, 2 May 2013 16:17:02 -0700 (PDT),
    wrote:

    >Hi all,
    >
    >As far as I understand, struct padding is implemented in order to speed up the execution of a program. I did this small example below in order to verify this issue. However, surprisingly, my program with struct padding is USUALLY slower than the one with struct packing. Any comments?
    >
    >Kind regards,
    >
    >Diego Andrés
    >
    >
    >#include <stdio.h>
    >#include <time.h>
    >
    >#define N 10000000
    >
    >struct struct_con_padding
    >{
    > char a;
    > int b;
    > char c;
    >} x[N];
    >
    >
    >struct __attribute__((__packed__)) struct_con_packing
    >{
    > char a;
    > int b;
    > char c;
    >} y[N];
    >
    >
    >int main(void)
    >{
    > int i;
    > clock_t tic, toc;
    >
    > printf("tamaño(x) = %d Mb, tamaño(y) = %d Mb\n", sizeof(x)/1024/1024, sizeof(y)/1024/1024);
    >
    > tic = clock();
    > for (i=0; i<N; i++) { x.a = 'Q'; x.b = 1000; x.c = 'R'; }
    > toc = clock();
    > printf("Tiempo con padding = %lf segundos\n", (double)(toc-tic)/CLOCKS_PER_SEC);
    >
    > tic = clock();
    > for (i=0; i<N; i++) { y.a = 'Q'; y.b = 1000; y.c = 'R'; }
    > toc = clock();
    > printf("Tiempo con packing = %lf segundos\n", (double)(toc-tic)/CLOCKS_PER_SEC);
    >
    > return 0;
    >}
    >
    >
    >-------------------------------------------------------------------------------
    >
    >
    >
    >[daalvarez@earendil ~]$ gcc -Wall -o 07_struct_padding2 07_struct_padding2.c
    >[daalvarez@earendil ~]$ ./07_struct_padding2
    >tamaño(x) = 114 Mb, tamaño(y) = 57 Mb
    >Tiempo con padding = 0.470000 segundos
    >Tiempo con packing = 0.180000 segundos
    >
    >
    >[daalvarez@earendil ~]$ gcc --version
    >gcc (GCC) 4.5.2
    >Copyright (C) 2010 Free Software Foundation, Inc.
    >This is free software; see the source for copying conditions. There is NO
    >warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    >
    >[daalvarez@earendil ~]$


    --
    Remove del for email
    Barry Schwarz, May 3, 2013
    #2
    1. Advertising

  3. wrote:

    > As far as I understand, struct padding is implemented in
    > order to speed up the execution of a program.
    > I did this small example below in order to verify this issue.
    > However, surprisingly, my program with struct padding is
    > USUALLY slower than the one with struct packing.
    > Any comments?


    On many processors, the aligned version is infinitely faster
    than the unaligned version.

    Otherwise, it depends on many factors. It is possible that keeping
    more in cache will more than make up for the different access times.

    -- glen
    glen herrmannsfeldt, May 3, 2013
    #3
  4. Alan Curry Guest

    In article <>,
    Robert Wessel <> wrote:
    >On Thu, 2 May 2013 16:17:02 -0700 (PDT),
    >wrote:
    >

    [...]
    >>
    >>struct struct_con_padding
    >>{
    >> char a;
    >> int b;
    >> char c;
    >>} x[N];
    >>
    >>
    >>struct __attribute__((__packed__)) struct_con_packing
    >>{
    >> char a;
    >> int b;
    >> char c;
    >>} y[N];
    >>

    [...]
    >>
    >> tic = clock();
    >> for (i=0; i<N; i++) { x.a = 'Q'; x.b = 1000; x.c = 'R'; }
    >> toc = clock();
    >> printf("Tiempo con padding = %lf segundos\n",

    >(double)(toc-tic)/CLOCKS_PER_SEC);
    >>
    >> tic = clock();
    >> for (i=0; i<N; i++) { y.a = 'Q'; y.b = 1000; y.c = 'R'; }
    >> toc = clock();
    >> printf("Tiempo con packing = %lf segundos\n",

    >(double)(toc-tic)/CLOCKS_PER_SEC);
    >>

    [...]
    >
    >
    >Part of the problem is that a simple store into the array has turned
    >this into a simple bandwidth test, and the shorter structure simply
    >wins. If the code were to access b from memory many times the aligned
    >version would likely win.


    Also since the timing loops are storing into previously untouched memory,
    a large part of what's being measured is how fast the kernel's page fault
    handler can generate pages full of zeros.

    --
    Alan Curry
    Alan Curry, May 3, 2013
    #4
  5. James Kuyper Guest

    On 05/03/2013 04:19 AM, christian.bau wrote:
    ....
    > To avoid confusion: "Mb" stands for megabit, "MB" stands for megabyte.
    > One megabit = 1,000,000 bits, not 1024 x 1024 bits. One megabyte =
    > 1,000,000 bytes, not 1024 x 1024 bytes. By using the proper units, you
    > would have found that 10 million elements are exactly 120 vs 60 MB. 12
    > vs. 6 MB per million elements.


    Apparently you're right - according to Wikipedia: "In 1998 the
    International Electrotechnical Commission (IEC) proposed standards for
    binary prefixes and requiring the use of megabyte to strictly denote
    10002 bytes and mebibyte to denote 10242 bytes. By the end of 2007, the
    IEC Standard had been adopted by the IEEE, EU, and NIST. Nevertheless,
    the term megabyte continues to be widely used with different meanings:"

    Note the last sentence - use of MB to mean 2^20 Bytes is still quite
    common. I had heard of disk makers using 10^6 byte "megabytes" to
    misleadingly inflate their disk capacities, but I had never before heard
    of this standard, or "mebibytes", until I did some research while
    preparing to write this response.
    --
    James Kuyper
    James Kuyper, May 3, 2013
    #5
  6. James Kuyper <> writes:
    > On 05/03/2013 04:19 AM, christian.bau wrote:
    > ...
    >> To avoid confusion: "Mb" stands for megabit, "MB" stands for megabyte.
    >> One megabit = 1,000,000 bits, not 1024 x 1024 bits. One megabyte =
    >> 1,000,000 bytes, not 1024 x 1024 bytes. By using the proper units, you
    >> would have found that 10 million elements are exactly 120 vs 60 MB. 12
    >> vs. 6 MB per million elements.

    >
    > Apparently you're right - according to Wikipedia: "In 1998 the
    > International Electrotechnical Commission (IEC) proposed standards for
    > binary prefixes and requiring the use of megabyte to strictly denote
    > 10002 bytes and mebibyte to denote 10242 bytes. By the end of 2007, the
    > IEC Standard had been adopted by the IEEE, EU, and NIST. Nevertheless,
    > the term megabyte continues to be widely used with different meanings:"


    It's a pity superscripts don't copy and paste well.

    Replace 10002 by 1000^2, and 10242 by 1024^2.

    [...]

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Working, but not speaking, for JetHead Development, Inc.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, May 3, 2013
    #6
  7. Ken Brody Guest

    On 5/2/2013 7:17 PM, wrote:
    [...]
    > #define N 10000000

    [...]
    > struct __attribute__((__packed__)) struct_con_packing
    > {
    > char a;
    > int b;
    > char c;
    > } y[N];

    [...]
    > for (i=0; i<N; i++) { y.a = 'Q'; y.b = 1000; y.c = 'R'; }

    [...]

    Question:

    Assume that the above is compiled on a 32-bit little-endian computer, such
    as an Intel x86 CPU. The 6 bytes in y would end up containing the
    following hex values:

    51 E8 03 00 00 52

    Following the "as-if" rule, couldn't the compiler optimize the loop to:

    Store 32-bit value 0x0003E851 at address X
    Store 16-bit value 0x5200 at address X+4

    I'd be curious as to the actual code generated by the OP's compiler.
    However, there are numerous possible explanations why this particular piece
    of code ran faster than the padded struct version, as the other replies have
    demonstrated.

    On the other hand, non-optimized code on a CPU such as the RS/6000, which
    forbids unaligned accesses, would run orders of magnitude slower. (Or just
    crash.) The last time I used such a system, the O/S was set up to trap the
    fault and emulate the access, using aligned reads/writes and bit shifting.
    Ken Brody, May 3, 2013
    #7
  8. æ–¼ 2013å¹´5月3日星期五UTC+8上åˆ7時17分02秒寫é“:
    > Hi all,
    >
    >
    >
    > As far as I understand, struct padding is implemented in order to speed up the execution of a program. I did this small example below in order to verify this issue. However, surprisingly, my program with struct padding is USUALLY slower than the one with struct packing. Any comments?
    >
    >
    >
    > Kind regards,
    >
    >
    >
    > Diego Andrés
    >
    >
    >
    >
    >
    > #include <stdio.h>
    >
    > #include <time.h>
    >
    >
    >
    > #define N 10000000
    >
    >


    Please test N in scales of ten from 100 to 1000000000L. // 32 bit
    >
    > struct struct_con_padding
    >
    > {
    >
    > char a;
    >
    > int b;
    >
    > char c;
    >
    > } x[N];
    >
    >
    >
    >
    >
    > struct __attribute__((__packed__)) struct_con_packing
    >
    > {
    >
    > char a;
    >
    > int b;
    >
    > char c;
    >
    > } y[N];
    >
    >
    >
    >
    >
    > int main(void)
    >
    > {
    >
    > int i;
    >
    > clock_t tic, toc;
    >
    >
    >
    > printf("tamaño(x) = %d Mb, tamaño(y) = %d Mb\n", sizeof(x)/1024/1024, sizeof(y)/1024/1024);
    >
    >
    >
    > tic = clock();
    >
    > for (i=0; i<N; i++) { x.a = 'Q'; x.b = 1000; x.c = 'R'; }
    >
    > toc = clock();
    >
    > printf("Tiempo con padding = %lf segundos\n", (double)(toc-tic)/CLOCKS_PER_SEC);
    >
    >
    >
    > tic = clock();
    >
    > for (i=0; i<N; i++) { y.a = 'Q'; y.b = 1000; y.c = 'R'; }
    >
    > toc = clock();
    >
    > printf("Tiempo con packing = %lf segundos\n", (double)(toc-tic)/CLOCKS_PER_SEC);
    >
    >
    >
    > return 0;
    >
    > }
    >
    >
    >
    >
    >
    > -------------------------------------------------------------------------------
    >
    >
    >
    >
    >
    >
    >
    > [daalvarez@earendil ~]$ gcc -Wall -o 07_struct_padding2 07_struct_padding2.c
    >
    > [daalvarez@earendil ~]$ ./07_struct_padding2
    >
    > tamaño(x) = 114 Mb, tamaño(y) = 57 Mb
    >
    > Tiempo con padding = 0.470000 segundos
    >
    > Tiempo con packing = 0.180000 segundos
    >
    >
    >
    >
    >
    > [daalvarez@earendil ~]$ gcc --version
    >
    > gcc (GCC) 4.5.2
    >
    > Copyright (C) 2010 Free Software Foundation, Inc.
    >
    > This is free software; see the source for copying conditions. There is NO
    >
    > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    >
    >
    >
    > [daalvarez@earendil ~]$
    88888 Dihedral, May 3, 2013
    #8
  9. In article <km0l3f$501$>,
    Ken Brody <> wrote:
    >
    >Question:
    >
    >Assume that the above is compiled on a 32-bit little-endian computer, such
    >as an Intel x86 CPU. The 6 bytes in y would end up containing the
    >following hex values:
    >
    > 51 E8 03 00 00 52
    >
    >Following the "as-if" rule, couldn't the compiler optimize the loop to:
    >
    > Store 32-bit value 0x0003E851 at address X
    > Store 16-bit value 0x5200 at address X+4
    >
    >I'd be curious as to the actual code generated by the OP's compiler.


    Indeed. Any chance of seeing the assembly code generated for this?

    --
    -Ed Falk,
    http://thespamdiaries.blogspot.com/
    Edward A. Falk, May 3, 2013
    #9
  10. Robert Wessel <> wrote:

    (snip)

    > Disk manufacturers have used the "correct" (10**3) meaning for the SI
    > prefixes mostly forever (IBM's marketing literature for the 3380
    > mentioned that a string could hold more than "10 billion" bytes,
    > although they didn't use "GB" in the original literature, they did
    > later:
    > http://www-03.ibm.com/ibm/history/exhibits/storage/storage_3380c.html).
    > Some floppy disk formats, and CD-ROMs (but not DVDs, Blue-Ray, and
    > other optical media) are the exceptions. It became something of an
    > issue when consumer disks started getting bigger than a few tens of
    > megabytes started showing a meaning difference from the "expected"
    > size.


    IBM disk drives tend not to use power of two sizes. The track size on
    the 3330 is 13030 bytes (one block per track).

    13030*404*19=100018280 bytes per pack. Note that 13030, 404, and 19
    are all not powers of two. (The 19 track/cylinder is because of the
    rotational position sensing information on the 20th track.)

    -- glen
    glen herrmannsfeldt, May 3, 2013
    #10
  11. christian.bau <> wrote:

    (snip)

    > One thing that annoys me with x86 compilers is that they compile 80
    > bit long double to have a size of 16 bytes instead of 10, so one
    > megabyte cannot hold 100,000 long doubles but only 62,500. On a modern
    > processor, the non-aligned access makes no difference (as far as I can
    > measure; but the smaller array sizes reduce bandwidth needs.


    Some years ago there were complaints about slow access to double on
    odd four byte boundaries. The 80486 was a 32 bit data bus, and so didn't
    notice. The MS compilers, and maybe others, put allocated to four byte
    boundaries. The the pentium and successors came along with 64 bit data
    bus, and things slowed down.

    I don't know what the natural alignment is for temporary real.

    -- glen
    glen herrmannsfeldt, May 4, 2013
    #11
  12. Geoff Guest

    On Sun, 05 May 2013 00:29:52 -0500, Robert Wessel
    <> wrote:

    >On Fri, 3 May 2013 21:53:51 +0000 (UTC), glen herrmannsfeldt
    ><> wrote:
    >
    >>Robert Wessel <> wrote:
    >>
    >>(snip)
    >>
    >>> Disk manufacturers have used the "correct" (10**3) meaning for the SI
    >>> prefixes mostly forever (IBM's marketing literature for the 3380
    >>> mentioned that a string could hold more than "10 billion" bytes,
    >>> although they didn't use "GB" in the original literature, they did
    >>> later:
    >>> http://www-03.ibm.com/ibm/history/exhibits/storage/storage_3380c.html).
    >>> Some floppy disk formats, and CD-ROMs (but not DVDs, Blue-Ray, and
    >>> other optical media) are the exceptions. It became something of an
    >>> issue when consumer disks started getting bigger than a few tens of
    >>> megabytes started showing a meaning difference from the "expected"
    >>> size.

    >>
    >>IBM disk drives tend not to use power of two sizes. The track size on
    >>the 3330 is 13030 bytes (one block per track).
    >>
    >>13030*404*19=100018280 bytes per pack. Note that 13030, 404, and 19
    >>are all not powers of two. (The 19 track/cylinder is because of the
    >>rotational position sensing information on the 20th track.)

    >
    >
    >True, although FBA-style disks don't really either, except that
    >whatever the size is in some multiple of 512 or 4096, which is almost
    >never (ever?) a power of two.
    >
    >But the point is that I don't think anyone has ever actually
    >documented any hard drive being sold in binary units. 10MB was always
    >10 million byte, often plus a smidge. After many iterations (and
    >years) on the Wikipedia article, the statement about hard drives was
    >changed to unequivocal after absolutely no evidence to the contrary
    >was found. If any such usage actually existed (and it could well), it
    >was extremely rare.


    Actually, what was happening was precisely that they _weren't_ using
    powers of two but neglecting to specify that fact.

    Memory was sold in units of powers-of-two bytes, as in 1MB = 1,048,576
    bytes and it was called one megabyte. Drive manufacturers were taking
    advantage of this and rating their drives in 1MB = 1,000,000 bytes,
    giving a 10 MB hard drive specification that inflated the expectation
    in the customer's mind by implying that it was that much bigger than
    it really was. The larger the disk, the greater the discrepancy. They
    were using the MB term but not telling you that it didn't stand for
    the same MB that memory manufacturers (and everyone else) were using.

    Then there's the old formatted vs unformatted specification
    controversy.

    Then Microsoft comes along and counts bytes in one interface and
    counts MB and GB in another interface where 2,446,950,400 bytes
    becomes 2.27GB in one display and 2,389,600 kB in another. :)
    Geoff, May 5, 2013
    #12
  13. Geoff Guest

    On Sun, 05 May 2013 01:50:59 -0500, Robert Wessel
    <> wrote:

    >I've never really bought that argument. There is one unambiguously
    >correct definition for M (1,000,000), and it is used in a vast number
    >of places. Even within computing, M=1,000,000 is used in more
    >distinct places than M=1,048,576, although there are a
    >disproportionate number of instances of the latter uses. Other than
    >memory, M=1,048,576, is basically is *not* used (there are of course a
    >few exceptions).


    Well, I can't make you buy it. It's ancient history to me now but I
    remember many discussions about it in the early or mid-70's.

    My electronics engineering background has taught me M = mega =
    1,000,000 and I never expected a 2 MegOhm resistor to be 2^20 Ohms but
    computer geeks were calling _mega_bytes_ by powers of two and those of
    us who knew better had to live with it.

    IEEE eventually had to step in and make up abbreviations for powers of
    ten vs. powers of two.

    http://en.wikipedia.org/wiki/Megabyte


    Suffice to say there exist two definitions of mega, 10^6 for the
    scientists, engineers and public and 2^20 for the hardware and
    software geeks.
    Geoff, May 5, 2013
    #13
  14. Robert Wessel <> wrote:

    (snip)

    > I've never really bought that argument. There is one unambiguously
    > correct definition for M (1,000,000), and it is used in a vast number
    > of places. Even within computing, M=1,000,000 is used in more
    > distinct places than M=1,048,576, although there are a
    > disproportionate number of instances of the latter uses. Other than
    > memory, M=1,048,576, is basically is *not* used (there are of course a
    > few exceptions). Did Intel mislead people when the sold a 100MHz
    > processor? Were people really expecting it to run at 104,857,600Hz?
    > Or expecting their 19.2kb modems to run 19,660 baud? Or the
    > computer's bus..., or disk transfer rate..., or the LAN..., or...?


    > This is perhaps an Americanism. Where else* could any non-specialists
    > think it's reasonable that M is *not* 1,000,000?


    In the S/370 days, I used to wonder if IBM prices processors in $K,
    if that was $1000 or $1024.

    Well, frequencies were in kHz and MHz before the question of computer
    clock speed came up.

    In the early years, there were computers with decimal addressing, and so
    powers of 10 (or small multiples of powers of 10) capacity.

    -- glen
    glen herrmannsfeldt, May 5, 2013
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. abilash
    Replies:
    1
    Views:
    555
    Jim Lewis
    May 17, 2005
  2. Tim Frawley
    Replies:
    1
    Views:
    404
    Hermit Dave
    Sep 29, 2004
  3. John Rivers

    VS.NET is 10 times slower than VB6

    John Rivers, Aug 23, 2005, in forum: ASP .Net
    Replies:
    91
    Views:
    3,086
    =?Utf-8?B?TWFuaWthbmRhbg==?=
    Nov 2, 2005
  4. croeltgen
    Replies:
    1
    Views:
    485
    Andrew Thompson
    Oct 25, 2004
  5. Andre Charbonneau

    XPath queries getting slower and slower...

    Andre Charbonneau, Feb 15, 2005, in forum: Java
    Replies:
    0
    Views:
    528
    Andre Charbonneau
    Feb 15, 2005
Loading...

Share This Page