Open letter to Ian Collins

Discussion in 'C Programming' started by jacob navia, May 26, 2012.

  1. jacob navia

    jacob navia Guest

    Hi Ian

    In the c++ newsgroup you asked me for an example of the ccl performance
    It was something with building a huge list, then sorting it, if I
    remember correctly...

    I have built this example code:

    #include <stdlib.h>
    #include "containers.h"
    #define N 10000000
    int main(void)
    {
    List *L1;
    size_t i,d;
    long long sum=0,newSum=0;
    Iterator *it;
    int *pint;

    L1 = iList.Create(sizeof(int));
    iList.UseHeap(L1,NULL);
    for (i=0; i<N;i++) {
    d = rand();
    sum += d;
    iList.Add(L1,&d);
    }
    iList.Sort(L1);
    it = iList.NewIterator(L1);
    for (pint = it->GetFirst(it); pint; pint = it->GetNext(it)) {
    newSum += *pint;
    }
    if (sum != newSum)
    printf("Failed\n");
    else printf("Sum = %lld\n",sum);
    iList.Finalize(L1);
    }

    This creates a list, then fills it with 10 million random integers,
    then sorts it and verifies that all the data is still there by
    accessing all elements of the list with an iterator.

    Can you please send me an equivalent C++program?

    Thanks
     
    jacob navia, May 26, 2012
    #1
    1. Advertising

  2. jacob navia

    Ike Naar Guest

    On 2012-05-26, jacob navia <> wrote:
    > Hi Ian
    >
    > In the c++ newsgroup you asked me for an example of the ccl performance
    > It was something with building a huge list, then sorting it, if I
    > remember correctly...
    >
    > I have built this example code:
    >
    > #include <stdlib.h>
    > #include "containers.h"
    > #define N 10000000
    > int main(void)
    > {
    > List *L1;
    > size_t i,d;
    > long long sum=0,newSum=0;
    > Iterator *it;
    > int *pint;
    >
    > L1 = iList.Create(sizeof(int));
    > iList.UseHeap(L1,NULL);
    > for (i=0; i<N;i++) {
    > d = rand();
    > sum += d;
    > iList.Add(L1,&d);


    This seems dangerous. iList.Add expects a pointer to an
    int for the second argument, and it receives a pointer to
    a size_t. What if int and size_t have different sizes?
     
    Ike Naar, May 26, 2012
    #2
    1. Advertising

  3. בת×ריך ×™×•× ×©×‘×ª,26 במ××™ 2012 23:22:01 UTC+1, מ×ת jacob navia:
    > iList.Sort(L1);
    > Can you please send me an equivalent C++program?
    >

    You very rarely want to sort a list of integers or other atomic type. Almost always you want to sort records based on a field, or on two fields, e.g. surname followed by intials, or you want to do somethign funny, like sorting strings alphabetically but putting embedded numbers in numeric order so that fred_99 comes before fred_100.
     
    Malcolm McLean, May 26, 2012
    #3
  4. jacob navia

    MelissA Guest

    On Sun, 27 May 2012 00:22:01 +0200
    jacob navia <> wrote:

    > Hi Ian
    >
    > In the c++ newsgroup you asked me for an example of the ccl
    > performance It was something with building a huge list, then sorting
    > it, if I remember correctly...
    >
    > I have built this example code:
    >
    > #include <stdlib.h>
    > #include "containers.h"
    > #define N 10000000
    > int main(void)
    > {
    > List *L1;
    > size_t i,d;
    > long long sum=0,newSum=0;
    > Iterator *it;
    > int *pint;
    >
    > L1 = iList.Create(sizeof(int));
    > iList.UseHeap(L1,NULL);
    > for (i=0; i<N;i++) {
    > d = rand();
    > sum += d;
    > iList.Add(L1,&d);
    > }
    > iList.Sort(L1);
    > it = iList.NewIterator(L1);
    > for (pint = it->GetFirst(it); pint; pint = it->GetNext(it)) {
    > newSum += *pint;
    > }
    > if (sum != newSum)
    > printf("Failed\n");
    > else printf("Sum = %lld\n",sum);
    > iList.Finalize(L1);
    > }
    >
    > This creates a list, then fills it with 10 million random integers,
    > then sorts it and verifies that all the data is still there by
    > accessing all elements of the list with an iterator.
    >
    > Can you please send me an equivalent C++program?
    >
    > Thanks


    Well, I have tested list sorting and traversing and quick sort
    by value (doesn't change nodes order) is much faster than
    standard stl list sort (which changes nodes order).
    Also after sort that doesn't change nodes order, traversing
    is much faster then after sort that does change nodes order.
    That is of course if initial list nodes are consecutive.
    Doesn't matter how much cache cpu has 512kb is enough
    to show significant difference.

    Greetings!
     
    MelissA, May 26, 2012
    #4
  5. jacob navia

    jacob navia Guest

    Le 27/05/12 00:36, Ike Naar a écrit :
    > This seems dangerous. iList.Add expects a pointer to an
    > int for the second argument, and it receives a pointer to
    > a size_t. What if int and size_t have different sizes?


    Well, the list software will read the lower sizeof(int) bytes, that's
    all.

    The other way around would be dangerous: passing a pointer to int to a
    size_t;

    But strictly speaking you are right of course. Will change that.
     
    jacob navia, May 27, 2012
    #5
  6. jacob navia

    jacob navia Guest

    Le 27/05/12 00:41, Malcolm McLean a écrit :
    > בת×ריך ×™×•× ×©×‘×ª, 26 במ××™ 2012 23:22:01 UTC+1, מ×ת jacob navia:
    >> iList.Sort(L1);
    >> Can you please send me an equivalent C++program?
    >>

    > You very rarely want to sort a list of integers or other atomic type.


    Almost always you want to sort records based on a field, or on two fields,

    e.g. surname followed by intials, or you want to do somethign funny,

    like sorting strings alphabetically but putting embedded numbers in

    numeric order so that fred_99 comes before fred_100.
    >
    >


    I know, it is just for comparing performance of ccl vs stl
     
    jacob navia, May 27, 2012
    #6
  7. בת×ריך ×™×•× ×©×‘×ª,26 במ××™ 2012 23:36:29 UTC+1, מ×ת Ike Naar:
    >
    > This seems dangerous. iList.Add expects a pointer to an
    > int for the second argument, and it receives a pointer to
    > a size_t. What if int and size_t have different sizes?
    >

    Yes, size_t is a menace. I've been saying that for a while. The problem is that programmers are unwilling to use it consistently. If you want an integer, you type int.
     
    Malcolm McLean, May 27, 2012
    #7
  8. On Sun, 27 May 2012 01:13:37 +0200, jacob navia <>
    wrote:

    >Le 27/05/12 00:36, Ike Naar a écrit :
    >> This seems dangerous. iList.Add expects a pointer to an
    >> int for the second argument, and it receives a pointer to
    >> a size_t. What if int and size_t have different sizes?

    >
    >Well, the list software will read the lower sizeof(int) bytes, that's
    >all.


    And on a big-endian machine this will produce the correct value?

    Not to mention the mandatory diagnostic during compilation.

    >
    >The other way around would be dangerous: passing a pointer to int to a
    >size_t;
    >
    >But strictly speaking you are right of course. Will change that.


    Even casually speaking.

    --
    Remove del for email
     
    Barry Schwarz, May 27, 2012
    #8
  9. On 26-May-12 17:22, jacob navia wrote:
    > Hi Ian
    >
    > In the c++ newsgroup you asked me for an example of the ccl performance
    > It was something with building a huge list, then sorting it, if I
    > remember correctly...


    Then shouldn't you have posted the response to "the c++ newsgroup"
    rather than comp.lang.c?

    > [a bunch of C++ code]


    Off-topic here.

    S

    --
    Stephen Sprunk "God does not play dice." --Albert Einstein
    CCIE #3723 "God is an inveterate gambler, and He throws the
    K5SSS dice at every possible opportunity." --Stephen Hawking
     
    Stephen Sprunk, May 27, 2012
    #9
  10. jacob navia

    Ian Collins Guest

    On 05/27/12 10:22 AM, jacob navia wrote:
    > Hi Ian
    >
    > In the c++ newsgroup you asked me for an example of the ccl performance
    > It was something with building a huge list, then sorting it, if I
    > remember correctly...
    >
    > I have built this example code:
    >
    > #include<stdlib.h>
    > #include "containers.h"
    > #define N 10000000
    > int main(void)
    > {
    > List *L1;
    > size_t i,d;
    > long long sum=0,newSum=0;
    > Iterator *it;
    > int *pint;


    All good code should have at least one beer reference!

    It be a lot easier to see what is what if these were declared on first use.

    > L1 = iList.Create(sizeof(int));
    > iList.UseHeap(L1,NULL);
    > for (i=0; i<N;i++) {
    > d = rand();
    > sum += d;
    > iList.Add(L1,&d);
    > }
    > iList.Sort(L1);
    > it = iList.NewIterator(L1);
    > for (pint = it->GetFirst(it); pint; pint = it->GetNext(it)) {
    > newSum += *pint;
    > }
    > if (sum != newSum)
    > printf("Failed\n");
    > else printf("Sum = %lld\n",sum);
    > iList.Finalize(L1);
    > }
    >
    > This creates a list, then fills it with 10 million random integers,
    > then sorts it and verifies that all the data is still there by
    > accessing all elements of the list with an iterator.
    >
    > Can you please send me an equivalent C++program?


    I did (with the addition of using one list to initialise another) in the
    c.l.c++ thread you should have posted this in response to.

    --
    Ian Collins
     
    Ian Collins, May 27, 2012
    #10
  11. jacob navia

    Ian Collins Guest

    On 05/27/12 10:22 AM, jacob navia wrote:
    > Hi Ian
    >
    > In the c++ newsgroup you asked me for an example of the ccl performance
    > It was something with building a huge list, then sorting it, if I
    > remember correctly...
    >
    > I have built this example code:
    >

    <snip>

    I get a segv when I run the example. Running in dbx with access
    checking enabled shows:

    Write to unallocated (wua):
    Attempting to write 1 byte at address 0x8091d88
    which is just past heap block of size 800 bytes at 0x8091a68
    This block was allocated from:
    newHeapObject<-NewLink<-Add_nd<-Add<-main
    Location of error: heap.c, line 75, newHeapObject()

    The offending code appears to be here in heap.c:

    /* The array of block pointers is full. Allocate CHUNK_SIZE blocks
    more */
    siz = (l->BlockCount+CHUNK_SIZE)*sizeof(ListElement *);
    result = l->Allocator->realloc(l->Heap,siz);
    if (result == NULL) {
    return NULL;
    }
    l->MemoryUsed+= siz;
    l->Heap = (char **)result;
    /* Position pointer at the start of the new area */
    result += l->BlockCount*sizeof(ListElement *);
    /* Zero the new pointers */
    memset(result,0,siz);

    It looks like you are allocating siz byes, incrementing part (half) way
    through the bock then zeroing the next siz bytes.

    --
    Ian Collins
     
    Ian Collins, May 27, 2012
    #11
  12. jacob navia

    jacob navia Guest

    Le 27/05/12 05:16, Ian Collins a écrit :
    > On 05/27/12 10:22 AM, jacob navia wrote:
    >> Hi Ian
    >>
    >> In the c++ newsgroup you asked me for an example of the ccl performance
    >> It was something with building a huge list, then sorting it, if I
    >> remember correctly...
    >>
    >> I have built this example code:
    >>

    > <snip>
    >
    > I get a segv when I run the example. Running in dbx with access checking
    > enabled shows:


    I corrected that yesterday. When you asked me gthis example I was in the
    middle of a reorganization of heap.c. Look at the latest update
     
    jacob navia, May 27, 2012
    #12
  13. jacob navia

    jacob navia Guest

    Le 27/05/12 03:40, Stephen Sprunk a écrit :
    >> [a bunch of C++ code]


    Why is it C++?

    It compiles with every C compiler you can choose...

    Never mind, don't let facts disturb your beliefs.
     
    jacob navia, May 27, 2012
    #13
  14. jacob navia

    jacob navia Guest

    Le 27/05/12 00:22, jacob navia a écrit :

    [snip code]

    Measuring the speed of the program, you can see that a big part of the
    time is spent in malloc.

    Without the line

    iList.UseHeap(L1);

    we obtain
    real 0m16.842s
    user 0m16.421s
    sys 0m0.401s


    but using a specialized heap we obtain
    real 0m11.541s
    user 0m11.253s
    sys 0m0.253s

    Problem is, a specialized heap (for objects of the same size)
    is difficult to implement correctly. After several tentatives
    I have decided that I would have

    1) An array of block pointers that can be realloced to hold more
    if full.
    2) Each of the positions of the pointers array points to a block of
    1000 objects. Those can't be realloced since they are passed
    to user programs
    3) A bit map that marks which positions are free within the
    structure. It should contain
    Number of blocks * 1000 bits

    When freeing a position I mark the corresponding bit as free
    and put the object in the free list.

    I am still not done with this.

    jacob
     
    jacob navia, May 27, 2012
    #14
  15. jacob navia

    Ian Collins Guest

    On 05/27/12 07:30 PM, jacob navia wrote:
    > Le 27/05/12 05:16, Ian Collins a écrit :
    >> On 05/27/12 10:22 AM, jacob navia wrote:
    >>> Hi Ian
    >>>
    >>> In the c++ newsgroup you asked me for an example of the ccl performance
    >>> It was something with building a huge list, then sorting it, if I
    >>> remember correctly...
    >>>
    >>> I have built this example code:
    >>>

    >> <snip>
    >>
    >> I get a segv when I run the example. Running in dbx with access checking
    >> enabled shows:

    >
    > I corrected that yesterday. When you asked me gthis example I was in the
    > middle of a reorganization of heap.c. Look at the latest update


    OK, I thought I had but I didn't notice wget saving the file as cc.zip.1
    rather than overwriting the version I downloaded earlier in the week.

    Anyway I ran ran the code built with Sun Studio rather than gcc because
    it runs quite a bit faster. As I expected, the big difference was in
    sorting:

    ccl version:

    To sort 85.6384S

    C++ version (std::list.sort()):

    To sort 18.7927S

    Iteration was closer:

    To access 2.80655S (ccl)
    To access 1.82896S

    --
    Ian Collins
     
    Ian Collins, May 27, 2012
    #15
  16. jacob navia

    jacob navia Guest

    Le 27/05/12 10:56, Ian Collins a écrit :
    > As I expected, the big difference was in sorting:
    >
    > ccl version:
    >
    > To sort 85.6384S
    >
    > C++ version (std::list.sort()):
    >
    > To sort 18.7927S


    Yes. But I am not finished yet. I am writing a generic
    qsort module that will avoid the function call overhead of
    the current solution.

    As it is now, qsort calls a user defined sort function, that makes a
    memcmp of the data...

    With the generic qsort function you would directly compare
    the data (without function calls) by specifying a macro
    for comparing your data. For integers it would be simply

    a - b

    what should put me on equal footing to C++.

    > Iteration was closer:


    > To access 2.80655S (ccl)
    > To access 1.82896S


    This is because iterators have a function call overhead.
    But this is surely not catastrophic: if after 10 million
    iterations the difference is just 1 second... it is not
    really bad.
     
    jacob navia, May 27, 2012
    #16
  17. jacob navia

    Ian Collins Guest

    On 05/27/12 09:12 PM, jacob navia wrote:
    > Le 27/05/12 10:56, Ian Collins a écrit :
    >> As I expected, the big difference was in sorting:
    >>
    >> ccl version:
    >>
    >> To sort 85.6384S
    >>
    >> C++ version (std::list.sort()):
    >>
    >> To sort 18.7927S

    >
    > Yes. But I am not finished yet. I am writing a generic
    > qsort module that will avoid the function call overhead of
    > the current solution.
    >
    > As it is now, qsort calls a user defined sort function, that makes a
    > memcmp of the data...
    >
    > With the generic qsort function you would directly compare
    > the data (without function calls) by specifying a macro
    > for comparing your data. For integers it would be simply
    >
    > a - b


    That will be interesting to see.

    --
    Ian Collins
     
    Ian Collins, May 27, 2012
    #17
  18. jacob navia

    BartC Guest

    "Ian Collins" <> wrote in message
    news:...
    > On 05/27/12 10:22 AM, jacob navia wrote:
    >> Hi Ian
    >>
    >> In the c++ newsgroup you asked me for an example of the ccl performance
    >> It was something with building a huge list, then sorting it, if I
    >> remember correctly...
    >>
    >> I have built this example code:
    >>
    >> #include<stdlib.h>
    >> #include "containers.h"
    >> #define N 10000000
    >> int main(void)
    >> {
    >> List *L1;
    >> size_t i,d;
    >> long long sum=0,newSum=0;
    >> Iterator *it;
    >> int *pint;

    >
    > All good code should have at least one beer reference!
    >
    > It be a lot easier to see what is what if these were declared on first
    > use.


    As in, instead of reading a play which starts with a cast of characters,
    they should be introduced whenever they first appear?

    Suppose you're delving into a random scene, (or have an extract for use
    elsewhere), and want to be reminded who a character is, you have to start
    ploughing through from page one first until you encounter their first
    mention?

    And if you are editing the play, and the character's first appearance is
    edited out, or is moved somewhere where it's no longer the first appearance;
    how much work is it going to be to fix up declarations properly?

    Getting back to C, wouldn't macros and enumerations (and file-scope
    variables, external references, etc) also all benefit from being declared
    where they are first used? (And what sort of mess would that make of any
    program...).

    Anyway I thought everyone used smart editors now, where you hover over a
    name or something, and it tells you everything about it.

    --
    Bartc
     
    BartC, May 27, 2012
    #18
  19. jacob navia

    Ian Collins Guest

    On 05/27/12 09:36 PM, BartC wrote:
    >
    >
    > "Ian Collins"<> wrote in message
    > news:...
    >> On 05/27/12 10:22 AM, jacob navia wrote:
    >>> Hi Ian
    >>>
    >>> In the c++ newsgroup you asked me for an example of the ccl performance
    >>> It was something with building a huge list, then sorting it, if I
    >>> remember correctly...
    >>>
    >>> I have built this example code:
    >>>
    >>> #include<stdlib.h>
    >>> #include "containers.h"
    >>> #define N 10000000
    >>> int main(void)
    >>> {
    >>> List *L1;
    >>> size_t i,d;
    >>> long long sum=0,newSum=0;
    >>> Iterator *it;
    >>> int *pint;

    >>
    >> All good code should have at least one beer reference!
    >>
    >> It be a lot easier to see what is what if these were declared on first
    >> use.

    >
    > As in, instead of reading a play which starts with a cast of characters,
    > they should be introduced whenever they first appear?
    >
    > Suppose you're delving into a random scene, (or have an extract for use
    > elsewhere), and want to be reminded who a character is, you have to start
    > ploughing through from page one first until you encounter their first
    > mention?


    A function is more like a limerick than a play.

    > And if you are editing the play, and the character's first appearance is
    > edited out, or is moved somewhere where it's no longer the first appearance;
    > how much work is it going to be to fix up declarations properly?


    No a lot.

    > Getting back to C, wouldn't macros and enumerations (and file-scope
    > variables, external references, etc) also all benefit from being declared
    > where they are first used? (And what sort of mess would that make of any
    > program...).


    I would make them longer, unlike declaring variables when they are
    needed makes them shorter. Besides, how do you introduce a const other
    that where it is initialised?

    > Anyway I thought everyone used smart editors now, where you hover over a
    > name or something, and it tells you everything about it.


    Alas, Thunderbird lack that feature.

    --
    Ian Collins
     
    Ian Collins, May 27, 2012
    #19
  20. jacob navia

    BartC Guest

    "Ian Collins" <> wrote in message
    news:...
    > On 05/27/12 09:36 PM, BartC wrote:
    >> "Ian Collins"<> wrote in message


    >>> It be a lot easier to see what is what if these were declared on first
    >>> use.

    >>
    >> As in, instead of reading a play which starts with a cast of characters,
    >> they should be introduced whenever they first appear?
    >>
    >> Suppose you're delving into a random scene, (or have an extract for use
    >> elsewhere), and want to be reminded who a character is, you have to start
    >> ploughing through from page one first until you encounter their first
    >> mention?

    >
    > A function is more like a limerick than a play.


    In that case I can't see it being a big imposition if the names were
    declared at the top of the function (ie. at the top of the screen if a
    function is small). Then it gets the clutter out of the code. And all uses
    of the name are identical (instead of one of them needing a declaration).
    Although, it will make the function a bit longer..

    >> Getting back to C, wouldn't macros and enumerations .... also
    >> all benefit from being declared where they are first used?


    > I would make them longer, unlike declaring variables when they are needed
    > makes them shorter.


    That's another thing; uses of i, j and k for loop indices is fairly
    standard. They probably will be ints, and the formality of having to declare
    them can be kept out of the way instead of making for-loops even longer.
    (Alternatively for-loop variables could be implicitly declared.)

    > Besides, how do you introduce a const other that where it is initialised?


    What do you mean by a const? If it is a name that is to be used in several
    places, then it can also benefit from splitting declaration (and
    initialisation) from it's use.

    It gets more complicated when declarations in blocks are used (which
    apparently have their own name spaces). But I've never used those, and if
    functions are small as everyone says they should be, I can't see the need.

    (Of course my point of view is from reading code rather than writing it. But
    the former should have priority.)

    --
    Bartc
     
    BartC, May 27, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. vertigo

    big letter -> small letter

    vertigo, Jul 6, 2004, in forum: Python
    Replies:
    4
    Views:
    815
    Reinhold Birkenfeld
    Jul 6, 2004
  2. Replies:
    0
    Views:
    273
  3. Replies:
    2
    Views:
    610
  4. Paul
    Replies:
    6
    Views:
    292
  5. Replies:
    5
    Views:
    124
    robic0
    Nov 26, 2005
Loading...

Share This Page