Performance help

Discussion in 'C++' started by Steve Chow, Sep 10, 2007.

  1. Steve Chow

    Steve Chow Guest

    hi, i'm reading a binary data file, which is basically a list of 7296
    offsets pointing to data blocks which represent scanlines containing
    three numbers; leftx, id, rightx. i can read these fine and push them
    into vectors using;

    for (int a = 0; a < 7296; a++)
    {
    for (int c = 1; ; c=c+2)
    {
    province *temp_prov = new province;
    cords *temp_cords = new cords;

    file.seekg (4*(7296+(1*c)+offsets[a]));

    file.read ((char *)&temp_cords->left, sizeof(short));
    file.read ((char *)&temp_prov->id, sizeof(short));
    file.read ((char *)&temp_cords->right, sizeof(short));
    temp_cords->line = a;

    temp_prov->lines.push_back (*temp_cords);
    provinces.push_back (*temp_prov);

    if (temp_cords->right == 18944)
    {
    delete temp_cords;
    delete temp_prov;
    break;
    }
    else
    {
    delete temp_cords;
    delete temp_prov;
    }
    }
    }

    while ugly and most likely wrong, i /assure/ you it works. here is my
    problem; i want to move the line data of all areas with a matching id
    into one vector, and then remove them. i was thinking something like

    for (int a = 0; a < provinces.size (); a++)
    {
    for (int b = 0; b < provinces.size (); b++)
    {
    if (provinces[a].id == provinces.id && a != b)
    {
    provinces[a].lines.push_back(provinces.lines[0]);
    provinces.erase (provinces.begin()+b);
    }
    }
    }

    but this is SLOWER than slow and im sure its probably 100% wrong but
    i'll never know because it apparently will takes hours to complete. i
    should mention that the amount of elements in provinces is; 557,406.
    any alternative approaches that'll let me do this quickly on a Pentium
    3 800?

    ps; if you want a better description of the file than i can provide
    http://www.inferis.org/eu2/tbl/id.tbl.asp
     
    Steve Chow, Sep 10, 2007
    #1
    1. Advertising

  2. Steve Chow wrote:
    > [..] here is my
    > problem; i want to move the line data of all areas with a matching id
    > into one vector, and then remove them. i was thinking something like
    >
    > for (int a = 0; a < provinces.size (); a++)
    > {
    > for (int b = 0; b < provinces.size (); b++)
    > {
    > if (provinces[a].id == provinces.id && a != b)
    > {
    > provinces[a].lines.push_back(provinces.lines[0]);
    > provinces.erase (provinces.begin()+b);
    > }
    > }
    > }
    >
    > but this is SLOWER than slow and im sure its probably 100% wrong but
    > i'll never know because it apparently will takes hours to complete. i
    > should mention that the amount of elements in provinces is; 557,406.
    > any alternative approaches that'll let me do this quickly on a Pentium
    > 3 800? [..]


    Don't know of a particular comipiler, but if you could (a) instead of
    erasing the 'b' right away mark it for erasure later, and then (b) do
    remove_if() on your vector with one erase:

    provinces.erase(std::remove_if(provinces.begin(),
    provinces.end(),
    is_specially_marked()),
    provinces.end());

    You're going to get the improvement you seek.

    Also, while it's not necessarily an improvement, try 'std::list' with
    your current algorithm (you won't be able to use index 'a' or 'b', but
    use the iterators instead, same difference).

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask
     
    Victor Bazarov, Sep 10, 2007
    #2
    1. Advertising

  3. Steve Chow

    Jerry Coffin Guest

    In article <>,
    says...

    [ ... ]

    > for (int a = 0; a < 7296; a++)
    > {
    > for (int c = 1; ; c=c+2)
    > {
    > province *temp_prov = new province;
    > cords *temp_cords = new cords;


    There doesn't seem to be any reason to allocate these dynamically. I'd
    just use:

    province temp_prov;
    temp_cords cords;
    >
    > file.seekg (4*(7296+(1*c)+offsets[a]));
    >
    > file.read ((char *)&temp_cords->left, sizeof(short));
    > file.read ((char *)&temp_prov->id, sizeof(short));
    > file.read ((char *)&temp_cords->right, sizeof(short));
    > temp_cords->line = a;
    >
    > temp_prov->lines.push_back (*temp_cords);
    > provinces.push_back (*temp_prov);


    Changing those to automatic variables means you can reduce this:
    >
    > if (temp_cords->right == 18944)
    > {
    > delete temp_cords;
    > delete temp_prov;
    > break;
    > }
    > else
    > {
    > delete temp_cords;
    > delete temp_prov;
    > }


    Down to simply:
    if ((temp_cords->right == 18944)
    break;

    The next step would be to overload operator>> for struct cords to make
    the mainline of the code a bit cleaner:

    std::istream &operator>>(std::istream &is, cords &c) {
    is.read((char *)c->left, sizeof(c->left));
    is.read((char *)c->id, sizeof(c->id));
    is.read((char *)c->right, sizeof(c->right));
    return is;
    }

    Then your loop looks something like this:

    for (int a=0; a<7296; a++) {
    cords temp_cords = {};

    for (int c=1; temp_cords.right != 18944; c+=2) {
    province temp_prov;

    file.seekg(4*(7296+c+offsets[a]));

    file >> temp_cords;
    temp_cords.line = a;

    temp_prov.lines.push_back(temp_cord);
    provinces.push_back(temp_prov);
    }

    > while ugly and most likely wrong, i /assure/ you it works. here is my
    > problem; i want to move the line data of all areas with a matching id
    > into one vector, and then remove them. i was thinking something like
    >
    > for (int a = 0; a < provinces.size (); a++)
    > {
    > for (int b = 0; b < provinces.size (); b++)
    > {
    > if (provinces[a].id == provinces.id && a != b)
    > {
    > provinces[a].lines.push_back(provinces.lines[0]);
    > provinces.erase (provinces.begin()+b);
    > }
    > }
    > }


    Okay, if I understand this, the idea is to merge all the lines forom
    provinces that have the same id into a single province. If that's the
    case, I'd start by sorting based on id -- then all the lines with the
    same id will be right next to each other. Being the way I am, I'd
    probably also avoid doing the manipulation in-place. Instead, I'd modify
    the previous loop to just create a vector of cords. Then, after I was
    done reading those, I'd sort the cords by id, and then copy the data for
    the cords into the vector of provinces.

    Right now, you have an O(N*N) pair of loops, so it executes.

    A quick sort (for example) will do the sort in approximately N*lg2(N)
    operations, and then the merge will take about N operations.

    Since N is 557,406, your pair of loops executes 310,701,448,836
    iterations. The quick sort will take approximately 10,639,972 operations
    and the merge takes 557,406 operations.

    If we figure all the operations involved areabout the same speed, the
    sort/merge should be about 29,000 _times_ as fast as what you're doing
    now. Some of the operations in the sort are probably a little slower,
    but I'd expect a _substantial_ improvement anyway.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Sep 10, 2007
    #3
  4. Steve Chow

    LR Guest

    Steve Chow wrote:
    > hi, i'm reading a binary data file, which is basically a list of 7296
    > offsets pointing to data blocks which represent scanlines containing
    > three numbers; leftx, id, rightx. i can read these fine and push them
    > into vectors using;


    What kind of vector do you push these onto?

    provinces? What type is that?


    >
    > for (int a = 0; a < 7296; a++)
    > {
    > for (int c = 1; ; c=c+2)
    > {


    It's not clear to me why you're new'ing these each time through
    the loop. Why not just make temps like,
    cords temp_cords;

    > //province *temp_prov = new province;
    > //cords *temp_cords = new cords;
    >
    > file.seekg (4*(7296+(1*c)+offsets[a]));
    >

    have to rewrite temp_cords-> as needed
    > file.read ((char *)&temp_cords->left, sizeof(short));
    > file.read ((char *)&temp_prov->id, sizeof(short));
    > file.read ((char *)&temp_cords->right, sizeof(short));
    > temp_cords->line = a;
    >

    province temp_prov;
    > temp_prov.lines.push_back (temp_cords);
    > provinces.push_back (temp_prov);


    ok, here if you're deleteing in both cases,
    if you use new, why not just delete
    unconditionally?
    delete temp_cords;
    delete temp_prov;

    >
    > if (temp_cords->right == 18944)
    > {
    > break;
    > }

    and this whole else block has no real purpose
    that I can see
    > else
    > {
    > delete temp_cords;
    > delete temp_prov;
    > }
    > }
    > }
    >
    > while ugly and most likely wrong, i /assure/ you it works. here is my
    > problem; i want to move the line data of all areas with a matching id
    > into one vector, and then remove them. i was thinking something like
    >
    > for (int a = 0; a < provinces.size (); a++)
    > {
    > for (int b = 0; b < provinces.size (); b++)


    starting here from zero each time?
    doesn't that mean that you'll count some
    instances twice?

    consider
    .... b = a + 1; ....

    BTW, if you have a vector,
    std::vector<SomeType> vec;
    then your for loop should probably look like:
    for(std::vector<SomeType>::size_type i = 0; ...
    some compilers will give an error if you use int.

    > {
    > if (provinces[a].id == provinces.id && a != b)


    I don't know what provinces[a].id is, and I
    don't know how it got initialized.

    > {
    > provinces[a].lines.push_back(provinces.lines[0]);
    > provinces.erase (provinces.begin()+b);


    what should the next value of b be as
    you go through the loop?
    > }
    > }
    > }
    >
    > but this is SLOWER than slow and im sure its probably 100% wrong but
    > i'll never know because it apparently will takes hours to complete. i
    > should mention that the amount of elements in provinces is; 557,406.


    Then perhaps you should construct a smaller data test set so you can
    test your algorithm.

    > any alternative approaches that'll let me do this quickly on a Pentium
    > 3 800?


    And perhaps, more importantly correctly? I don't think that the
    platform is relevant.

    BTW, have you thought about std::map?



    LR
     
    LR, Sep 10, 2007
    #4
  5. Steve Chow

    upashu2 Guest

    Steve Chow wrote:
    > while ugly and most likely wrong, i /assure/ you it works. here is my
    > problem; i want to move the line data of all areas with a matching id
    > into one vector, and then remove them. i was thinking something like
    >
    > for (int a = 0; a < provinces.size (); a++)
    > {
    > for (int b = 0; b < provinces.size (); b++)
    > {
    > if (provinces[a].id == provinces.id && a != b)
    > {
    > provinces[a].lines.push_back(provinces.lines[0]);
    > provinces.erase (provinces.begin()+b);
    > }
    > }
    > }
    >
    > but this is SLOWER than slow and im sure its probably 100% wrong but


    Bad code using vectors.
    Because vectors keep an array format, erasing on positions other than
    the vector end also moves all the elements after the segment erased to
    their new positions, which may not be a method as efficient as erasing
    in other kinds of sequence containers (deque, list).And also this
    invalidates all iterator and references to elements after that
    position.

    vectors are good only at removing element from end, not between. you
    are doing N*N time erase on vector of large size.it would be slow.
     
    upashu2, Sep 10, 2007
    #5
  6. Steve Chow <> writes:


    [...]

    > i want to move the line data of all areas with a matching id
    > into one vector, and then remove them. i was thinking something like


    As others have mentioned, a vector probably isn't the right data
    structure for this. An STL multimap sounds might do most of the work
    for you, and the nonstandard but common hash_multimap would be
    somewhat faster if you have one (there's probably one in Boost if
    not). You would use the provice ID as the key. See:

    http://www.sgi.com/tech/stl/Multimap.html
    http://www.sgi.com/tech/stl/hash_multimap.html

    -----Scott.
     
    Scott Gifford, Sep 10, 2007
    #6
  7. Steve Chow

    Steve Chow Guest

    thanks for all your help! if you're interested i'll post the code i
    come up with after taking all your advice.
     
    Steve Chow, Sep 10, 2007
    #7
  8. Steve Chow

    Jerry Coffin Guest

    In article <>, says...

    [ ... ]

    > As others have mentioned, a vector probably isn't the right data
    > structure for this.


    I think a vector can work just fine for this -- but it requires doing
    the job a bit differently. If (as I advised) you start with separate
    vectors, so you copy the data from one to the other, and then clear the
    first one all at once, you encounter no problem.

    If the extra memory usage from temporarily having two complete copies of
    the data is a problem, you can avoid that as well: iterate through the
    first vector in reverse, and erase each item after its data has been
    copied to the second vector. Since this only ever erases items from the
    end, the erases are quick and efficient.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Sep 11, 2007
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. jm
    Replies:
    1
    Views:
    511
    alien2_51
    Dec 12, 2003
  2. Sojwal
    Replies:
    2
    Views:
    318
    Chris Uppal
    Jan 27, 2004
  3. Lyall
    Replies:
    1
    Views:
    416
    Lyall Pearce
    Feb 6, 2006
  4. Dennis Roberts

    Help with script with performance problems

    Dennis Roberts, Nov 23, 2003, in forum: Python
    Replies:
    7
    Views:
    335
    Peter Otten
    Nov 23, 2003
  5. Software Engineer
    Replies:
    0
    Views:
    332
    Software Engineer
    Jun 10, 2011
Loading...

Share This Page