Performance help

Steve Chow · Sep 10, 2007

hi, i'm reading a binary data file, which is basically a list of 7296
offsets pointing to data blocks which represent scanlines containing
three numbers; leftx, id, rightx. i can read these fine and push them
into vectors using;

for (int a = 0; a < 7296; a++)
{
for (int c = 1; ; c=c+2)
{
province *temp_prov = new province;
cords *temp_cords = new cords;

file.seekg (4*(7296+(1*c)+offsets[a]));

file.read ((char *)&temp_cords->left, sizeof(short));
file.read ((char *)&temp_prov->id, sizeof(short));
file.read ((char *)&temp_cords->right, sizeof(short));
temp_cords->line = a;

temp_prov->lines.push_back (*temp_cords);
provinces.push_back (*temp_prov);

if (temp_cords->right == 18944)
{
delete temp_cords;
delete temp_prov;
break;
}
else
{
delete temp_cords;
delete temp_prov;
}
}
}

while ugly and most likely wrong, i /assure/ you it works. here is my
problem; i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like

for (int a = 0; a < provinces.size (); a++)
{
for (int b = 0; b < provinces.size (); b++)
{
if (provinces[a].id == provinces.id && a != b)
{
provinces[a].lines.push_back(provinces.lines[0]);
provinces.erase (provinces.begin()+b);
}
}
}

but this is SLOWER than slow and im sure its probably 100% wrong but
i'll never know because it apparently will takes hours to complete. i
should mention that the amount of elements in provinces is; 557,406.
any alternative approaches that'll let me do this quickly on a Pentium
3 800?

ps; if you want a better description of the file than i can provide
http://www.inferis.org/eu2/tbl/id.tbl.asp

Victor Bazarov · Sep 10, 2007

Steve said:
[..] here is my
problem; i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like

for (int a = 0; a < provinces.size (); a++)
{
for (int b = 0; b < provinces.size (); b++)
{
if (provinces[a].id == provinces.id && a != b)
{
provinces[a].lines.push_back(provinces.lines[0]);
provinces.erase (provinces.begin()+b);
}
}
}

but this is SLOWER than slow and im sure its probably 100% wrong but
i'll never know because it apparently will takes hours to complete. i
should mention that the amount of elements in provinces is; 557,406.
any alternative approaches that'll let me do this quickly on a Pentium
3 800? [..]

Don't know of a particular comipiler, but if you could (a) instead of
erasing the 'b' right away mark it for erasure later, and then (b) do
remove_if() on your vector with one erase:

provinces.erase(std::remove_if(provinces.begin(),
provinces.end(),
is_specially_marked()),
provinces.end());

You're going to get the improvement you seek.

Also, while it's not necessarily an improvement, try 'std::list' with
your current algorithm (you won't be able to use index 'a' or 'b', but
use the iterators instead, same difference).

V

Jerry Coffin · Sep 10, 2007

[ ... ]

for (int a = 0; a < 7296; a++)
{
for (int c = 1; ; c=c+2)
{
province *temp_prov = new province;
cords *temp_cords = new cords;

There doesn't seem to be any reason to allocate these dynamically. I'd
just use:

province temp_prov;
temp_cords cords;

file.seekg (4*(7296+(1*c)+offsets[a]));

file.read ((char *)&temp_cords->left, sizeof(short));
file.read ((char *)&temp_prov->id, sizeof(short));
file.read ((char *)&temp_cords->right, sizeof(short));
temp_cords->line = a;

temp_prov->lines.push_back (*temp_cords);
provinces.push_back (*temp_prov);

Changing those to automatic variables means you can reduce this:

if (temp_cords->right == 18944)
{
delete temp_cords;
delete temp_prov;
break;
}
else
{
delete temp_cords;
delete temp_prov;
}

Down to simply:
if ((temp_cords->right == 18944)
break;

The next step would be to overload operator>> for struct cords to make
the mainline of the code a bit cleaner:

std::istream &operator>>(std::istream &is, cords &c) {
is.read((char *)c->left, sizeof(c->left));
is.read((char *)c->id, sizeof(c->id));
is.read((char *)c->right, sizeof(c->right));
return is;
}

Then your loop looks something like this:

for (int a=0; a<7296; a++) {
cords temp_cords = {};

for (int c=1; temp_cords.right != 18944; c+=2) {
province temp_prov;

file.seekg(4*(7296+c+offsets[a]));

file >> temp_cords;
temp_cords.line = a;

temp_prov.lines.push_back(temp_cord);
provinces.push_back(temp_prov);
}

while ugly and most likely wrong, i /assure/ you it works. here is my
problem; i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like

for (int a = 0; a < provinces.size (); a++)
{
for (int b = 0; b < provinces.size (); b++)
{
if (provinces[a].id == provinces.id && a != b)
{
provinces[a].lines.push_back(provinces.lines[0]);
provinces.erase (provinces.begin()+b);
}
}
}

Okay, if I understand this, the idea is to merge all the lines forom
provinces that have the same id into a single province. If that's the
case, I'd start by sorting based on id -- then all the lines with the
same id will be right next to each other. Being the way I am, I'd
probably also avoid doing the manipulation in-place. Instead, I'd modify
the previous loop to just create a vector of cords. Then, after I was
done reading those, I'd sort the cords by id, and then copy the data for
the cords into the vector of provinces.

Right now, you have an O(N*N) pair of loops, so it executes.

A quick sort (for example) will do the sort in approximately N*lg2(N)
operations, and then the merge will take about N operations.

Since N is 557,406, your pair of loops executes 310,701,448,836
iterations. The quick sort will take approximately 10,639,972 operations
and the merge takes 557,406 operations.

If we figure all the operations involved areabout the same speed, the
sort/merge should be about 29,000 _times_ as fast as what you're doing
now. Some of the operations in the sort are probably a little slower,
but I'd expect a _substantial_ improvement anyway.

LR · Sep 10, 2007

Steve said:
hi, i'm reading a binary data file, which is basically a list of 7296
offsets pointing to data blocks which represent scanlines containing
three numbers; leftx, id, rightx. i can read these fine and push them
into vectors using;

What kind of vector do you push these onto?

provinces? What type is that?

for (int a = 0; a < 7296; a++)
{
for (int c = 1; ; c=c+2)
> {

It's not clear to me why you're new'ing these each time through
the loop. Why not just make temps like,
cords temp_cords;

//province *temp_prov = new province;
//cords *temp_cords = new cords;

file.seekg (4*(7296+(1*c)+offsets[a]));

have to rewrite temp_cords-> as needed

file.read ((char *)&temp_cords->left, sizeof(short));
file.read ((char *)&temp_prov->id, sizeof(short));
file.read ((char *)&temp_cords->right, sizeof(short));
temp_cords->line = a;

province temp_prov;

temp_prov.lines.push_back (temp_cords);
provinces.push_back (temp_prov);

ok, here if you're deleteing in both cases,
if you use new, why not just delete
unconditionally?
delete temp_cords;
delete temp_prov;

if (temp_cords->right == 18944)
{
break;
}

and this whole else block has no real purpose
that I can see

else
{
delete temp_cords;
delete temp_prov;
}
}
}

while ugly and most likely wrong, i /assure/ you it works. here is my
problem; i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like

for (int a = 0; a < provinces.size (); a++)
{
for (int b = 0; b < provinces.size (); b++)

starting here from zero each time?
doesn't that mean that you'll count some
instances twice?

consider
.... b = a + 1; ....

BTW, if you have a vector,
std::vector<SomeType> vec;
then your for loop should probably look like:

{
if (provinces[a].id == provinces.id && a != b)

I don't know what provinces[a].id is, and I
don't know how it got initialized.

{
provinces[a].lines.push_back(provinces.lines[0]);
provinces.erase (provinces.begin()+b);

Click to expand...

what should the next value of b be as
you go through the loop?

}
}
}

but this is SLOWER than slow and im sure its probably 100% wrong but
i'll never know because it apparently will takes hours to complete. i
should mention that the amount of elements in provinces is; 557,406.

Click to expand...

Then perhaps you should construct a smaller data test set so you can
test your algorithm.

any alternative approaches that'll let me do this quickly on a Pentium
3 800?

Click to expand...

And perhaps, more importantly correctly? I don't think that the
platform is relevant.

BTW, have you thought about std::map?

LR

upashu2 · Sep 10, 2007

Steve said:
while ugly and most likely wrong, i /assure/ you it works. here is my
problem; i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like

for (int a = 0; a < provinces.size (); a++)
{
for (int b = 0; b < provinces.size (); b++)
{
if (provinces[a].id == provinces.id && a != b)
{
provinces[a].lines.push_back(provinces.lines[0]);
provinces.erase (provinces.begin()+b);
}
}
}

but this is SLOWER than slow and im sure its probably 100% wrong but

Bad code using vectors.
Because vectors keep an array format, erasing on positions other than
the vector end also moves all the elements after the segment erased to
their new positions, which may not be a method as efficient as erasing
in other kinds of sequence containers (deque, list).And also this
invalidates all iterator and references to elements after that
position.

vectors are good only at removing element from end, not between. you
are doing N*N time erase on vector of large size.it would be slow.

Scott Gifford · Sep 10, 2007

[...]

i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like

As others have mentioned, a vector probably isn't the right data
structure for this. An STL multimap sounds might do most of the work
for you, and the nonstandard but common hash_multimap would be
somewhat faster if you have one (there's probably one in Boost if
not). You would use the provice ID as the key. See:

http://www.sgi.com/tech/stl/Multimap.html
http://www.sgi.com/tech/stl/hash_multimap.html

-----Scott.

Steve Chow · Sep 10, 2007

thanks for all your help! if you're interested i'll post the code i
come up with after taking all your advice.

Jerry Coffin · Sep 11, 2007

[email protected] says... said:
As others have mentioned, a vector probably isn't the right data
structure for this.

I think a vector can work just fine for this -- but it requires doing
the job a bit differently. If (as I advised) you start with separate
vectors, so you copy the data from one to the other, and then clear the
first one all at once, you encounter no problem.

If the extra memory usage from temporarily having two complete copies of
the data is a problem, you can avoid that as well: iterate through the
first vector in reverse, and erase each item after its data has been
copied to the second vector. Since this only ever erases items from the
end, the erases are quick and efficient.

Help with Loop	0	Mar 30, 2023
Array of structs function pointer	10	Jul 16, 2023
Unable to get std::sort working	2	Mar 3, 2010
Vector Performance?	7	Jun 1, 2010
Compiler Performance with Compile-Time Array?	2	Oct 27, 2012
Performance problem with Pthread code	1	Mar 26, 2010
Lexical Analysis on C++	1	Oct 31, 2023
Code was not Working Please Help	1	May 30, 2023

Performance help

Steve Chow

Victor Bazarov

Jerry Coffin

LR

upashu2

Scott Gifford

Steve Chow

Jerry Coffin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads