Performance help

S

Steve Chow

hi, i'm reading a binary data file, which is basically a list of 7296
offsets pointing to data blocks which represent scanlines containing
three numbers; leftx, id, rightx. i can read these fine and push them
into vectors using;

for (int a = 0; a < 7296; a++)
{
for (int c = 1; ; c=c+2)
{
province *temp_prov = new province;
cords *temp_cords = new cords;

file.seekg (4*(7296+(1*c)+offsets[a]));

file.read ((char *)&temp_cords->left, sizeof(short));
file.read ((char *)&temp_prov->id, sizeof(short));
file.read ((char *)&temp_cords->right, sizeof(short));
temp_cords->line = a;

temp_prov->lines.push_back (*temp_cords);
provinces.push_back (*temp_prov);

if (temp_cords->right == 18944)
{
delete temp_cords;
delete temp_prov;
break;
}
else
{
delete temp_cords;
delete temp_prov;
}
}
}

while ugly and most likely wrong, i /assure/ you it works. here is my
problem; i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like

for (int a = 0; a < provinces.size (); a++)
{
for (int b = 0; b < provinces.size (); b++)
{
if (provinces[a].id == provinces.id && a != b)
{
provinces[a].lines.push_back(provinces.lines[0]);
provinces.erase (provinces.begin()+b);
}
}
}

but this is SLOWER than slow and im sure its probably 100% wrong but
i'll never know because it apparently will takes hours to complete. i
should mention that the amount of elements in provinces is; 557,406.
any alternative approaches that'll let me do this quickly on a Pentium
3 800?

ps; if you want a better description of the file than i can provide
http://www.inferis.org/eu2/tbl/id.tbl.asp
 
V

Victor Bazarov

Steve said:
[..] here is my
problem; i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like

for (int a = 0; a < provinces.size (); a++)
{
for (int b = 0; b < provinces.size (); b++)
{
if (provinces[a].id == provinces.id && a != b)
{
provinces[a].lines.push_back(provinces.lines[0]);
provinces.erase (provinces.begin()+b);
}
}
}

but this is SLOWER than slow and im sure its probably 100% wrong but
i'll never know because it apparently will takes hours to complete. i
should mention that the amount of elements in provinces is; 557,406.
any alternative approaches that'll let me do this quickly on a Pentium
3 800? [..]


Don't know of a particular comipiler, but if you could (a) instead of
erasing the 'b' right away mark it for erasure later, and then (b) do
remove_if() on your vector with one erase:

provinces.erase(std::remove_if(provinces.begin(),
provinces.end(),
is_specially_marked()),
provinces.end());

You're going to get the improvement you seek.

Also, while it's not necessarily an improvement, try 'std::list' with
your current algorithm (you won't be able to use index 'a' or 'b', but
use the iterators instead, same difference).

V
 
J

Jerry Coffin

[ ... ]
for (int a = 0; a < 7296; a++)
{
for (int c = 1; ; c=c+2)
{
province *temp_prov = new province;
cords *temp_cords = new cords;

There doesn't seem to be any reason to allocate these dynamically. I'd
just use:

province temp_prov;
temp_cords cords;
file.seekg (4*(7296+(1*c)+offsets[a]));

file.read ((char *)&temp_cords->left, sizeof(short));
file.read ((char *)&temp_prov->id, sizeof(short));
file.read ((char *)&temp_cords->right, sizeof(short));
temp_cords->line = a;

temp_prov->lines.push_back (*temp_cords);
provinces.push_back (*temp_prov);

Changing those to automatic variables means you can reduce this:
if (temp_cords->right == 18944)
{
delete temp_cords;
delete temp_prov;
break;
}
else
{
delete temp_cords;
delete temp_prov;
}

Down to simply:
if ((temp_cords->right == 18944)
break;

The next step would be to overload operator>> for struct cords to make
the mainline of the code a bit cleaner:

std::istream &operator>>(std::istream &is, cords &c) {
is.read((char *)c->left, sizeof(c->left));
is.read((char *)c->id, sizeof(c->id));
is.read((char *)c->right, sizeof(c->right));
return is;
}

Then your loop looks something like this:

for (int a=0; a<7296; a++) {
cords temp_cords = {};

for (int c=1; temp_cords.right != 18944; c+=2) {
province temp_prov;

file.seekg(4*(7296+c+offsets[a]));

file >> temp_cords;
temp_cords.line = a;

temp_prov.lines.push_back(temp_cord);
provinces.push_back(temp_prov);
}
while ugly and most likely wrong, i /assure/ you it works. here is my
problem; i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like

for (int a = 0; a < provinces.size (); a++)
{
for (int b = 0; b < provinces.size (); b++)
{
if (provinces[a].id == provinces.id && a != b)
{
provinces[a].lines.push_back(provinces.lines[0]);
provinces.erase (provinces.begin()+b);
}
}
}


Okay, if I understand this, the idea is to merge all the lines forom
provinces that have the same id into a single province. If that's the
case, I'd start by sorting based on id -- then all the lines with the
same id will be right next to each other. Being the way I am, I'd
probably also avoid doing the manipulation in-place. Instead, I'd modify
the previous loop to just create a vector of cords. Then, after I was
done reading those, I'd sort the cords by id, and then copy the data for
the cords into the vector of provinces.

Right now, you have an O(N*N) pair of loops, so it executes.

A quick sort (for example) will do the sort in approximately N*lg2(N)
operations, and then the merge will take about N operations.

Since N is 557,406, your pair of loops executes 310,701,448,836
iterations. The quick sort will take approximately 10,639,972 operations
and the merge takes 557,406 operations.

If we figure all the operations involved areabout the same speed, the
sort/merge should be about 29,000 _times_ as fast as what you're doing
now. Some of the operations in the sort are probably a little slower,
but I'd expect a _substantial_ improvement anyway.
 
L

LR

Steve said:
hi, i'm reading a binary data file, which is basically a list of 7296
offsets pointing to data blocks which represent scanlines containing
three numbers; leftx, id, rightx. i can read these fine and push them
into vectors using;

What kind of vector do you push these onto?

provinces? What type is that?

for (int a = 0; a < 7296; a++)
{
for (int c = 1; ; c=c+2)
> {

It's not clear to me why you're new'ing these each time through
the loop. Why not just make temps like,
cords temp_cords;
//province *temp_prov = new province;
//cords *temp_cords = new cords;

file.seekg (4*(7296+(1*c)+offsets[a]));
have to rewrite temp_cords-> as needed
file.read ((char *)&temp_cords->left, sizeof(short));
file.read ((char *)&temp_prov->id, sizeof(short));
file.read ((char *)&temp_cords->right, sizeof(short));
temp_cords->line = a;
province temp_prov;
temp_prov.lines.push_back (temp_cords);
provinces.push_back (temp_prov);

ok, here if you're deleteing in both cases,
if you use new, why not just delete
unconditionally?
delete temp_cords;
delete temp_prov;
if (temp_cords->right == 18944)
{
break;
}
and this whole else block has no real purpose
that I can see
else
{
delete temp_cords;
delete temp_prov;
}
}
}

while ugly and most likely wrong, i /assure/ you it works. here is my
problem; i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like

for (int a = 0; a < provinces.size (); a++)
{
for (int b = 0; b < provinces.size (); b++)

starting here from zero each time?
doesn't that mean that you'll count some
instances twice?

consider
.... b = a + 1; ....

BTW, if you have a vector,
std::vector<SomeType> vec;
then your for loop should probably look like:
{
if (provinces[a].id == provinces.id && a != b)


I don't know what provinces[a].id is, and I
don't know how it got initialized.
{
provinces[a].lines.push_back(provinces.lines[0]);
provinces.erase (provinces.begin()+b);


what should the next value of b be as
you go through the loop?
}
}
}

but this is SLOWER than slow and im sure its probably 100% wrong but
i'll never know because it apparently will takes hours to complete. i
should mention that the amount of elements in provinces is; 557,406.

Then perhaps you should construct a smaller data test set so you can
test your algorithm.
any alternative approaches that'll let me do this quickly on a Pentium
3 800?

And perhaps, more importantly correctly? I don't think that the
platform is relevant.

BTW, have you thought about std::map?



LR
 
U

upashu2

Steve said:
while ugly and most likely wrong, i /assure/ you it works. here is my
problem; i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like

for (int a = 0; a < provinces.size (); a++)
{
for (int b = 0; b < provinces.size (); b++)
{
if (provinces[a].id == provinces.id && a != b)
{
provinces[a].lines.push_back(provinces.lines[0]);
provinces.erase (provinces.begin()+b);
}
}
}

but this is SLOWER than slow and im sure its probably 100% wrong but


Bad code using vectors.
Because vectors keep an array format, erasing on positions other than
the vector end also moves all the elements after the segment erased to
their new positions, which may not be a method as efficient as erasing
in other kinds of sequence containers (deque, list).And also this
invalidates all iterator and references to elements after that
position.

vectors are good only at removing element from end, not between. you
are doing N*N time erase on vector of large size.it would be slow.
 
S

Scott Gifford

[...]
i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like

As others have mentioned, a vector probably isn't the right data
structure for this. An STL multimap sounds might do most of the work
for you, and the nonstandard but common hash_multimap would be
somewhat faster if you have one (there's probably one in Boost if
not). You would use the provice ID as the key. See:

http://www.sgi.com/tech/stl/Multimap.html
http://www.sgi.com/tech/stl/hash_multimap.html

-----Scott.
 
S

Steve Chow

thanks for all your help! if you're interested i'll post the code i
come up with after taking all your advice.
 
J

Jerry Coffin

[email protected] says... said:
As others have mentioned, a vector probably isn't the right data
structure for this.

I think a vector can work just fine for this -- but it requires doing
the job a bit differently. If (as I advised) you start with separate
vectors, so you copy the data from one to the other, and then clear the
first one all at once, you encounter no problem.

If the extra memory usage from temporarily having two complete copies of
the data is a problem, you can avoid that as well: iterate through the
first vector in reverse, and erase each item after its data has been
copied to the second vector. Since this only ever erases items from the
end, the erases are quick and efficient.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top