Efficiently Extracting Identical Values From A List/Array

Adam Hartshorne · Feb 21, 2005

As a result of a graphics based algorihtms, I have a list of indices to
a set of nodes.

I want to efficiently identify any node indices that are stored multiple
times in the array and the location of them in the array /list. Hence
the output being some list of lists, containing groups of indices of the
storage array that point to the same node index.

This is obviously a trivial problem, but if my storage list is large and
the set of nodes large (and hence lots of repeated indices) this
problem could become a bottleneck,

Adam

Thomas Maier-Komor · Feb 21, 2005

Adam said:
As a result of a graphics based algorihtms, I have a list of indices to
a set of nodes.

I want to efficiently identify any node indices that are stored multiple
times in the array and the location of them in the array /list. Hence
the output being some list of lists, containing groups of indices of the
storage array that point to the same node index.

This is obviously a trivial problem, but if my storage list is large and
the set of nodes large (and hence lots of repeated indices) this
problem could become a bottleneck,

Adam

what about STL's unique_copy?

Tom

Ivan Vecerina · Feb 21, 2005

Adam Hartshorne said:
As a result of a graphics based algorihtms, I have a list of indices to a
set of nodes.

I want to efficiently identify any node indices that are stored multiple
times in the array and the location of them in the array /list. Hence the
output being some list of lists, containing groups of indices of the
storage array that point to the same node index.

This is obviously a trivial problem, but if my storage list is large and
the set of nodes large (and hence lots of repeated indices) this problem
could become a bottleneck,

An "easy" way would be to use:
std::multimap< int/*nodeIndex*/, std::vector<int/*arrayIndex*/> > myList;
// for each index:
myList[aNodeIndex].push_back( anArrayIndex );

Likely to be more efficient:
std::vector< std:

air<int/*nodeIndex*/,int/*arrayIndex*/> > myList;
myList.reserve( theSizeOfTheArrayOfIndices );
// for each index:
myList.push_back( std:

air<int,int>( aNodeIndex, anArrayIndex ) );
std::sort( myList.begin(), myList.end() );
// --> scan for consecutive items with the same node index

A hash_map (or unordered_map) could be tested too, but I would expect
the vector version to be faster (just a guess...).

Ivan

Adam Hartshorne · Feb 21, 2005

Ivan said:
As a result of a graphics based algorihtms, I have a list of indices to a
set of nodes.

I want to efficiently identify any node indices that are stored multiple
times in the array and the location of them in the array /list. Hence the
output being some list of lists, containing groups of indices of the
storage array that point to the same node index.

This is obviously a trivial problem, but if my storage list is large and
the set of nodes large (and hence lots of repeated indices) this problem
could become a bottleneck,

Click to expand...

An "easy" way would be to use:
std::multimap< int/*nodeIndex*/, std::vector<int/*arrayIndex*/> > myList;
// for each index:
myList[aNodeIndex].push_back( anArrayIndex );

Likely to be more efficient:
std::vector< std:air<int/*nodeIndex*/,int/*arrayIndex*/> > myList;
myList.reserve( theSizeOfTheArrayOfIndices );
// for each index:
myList.push_back( std:air<int,int>( aNodeIndex, anArrayIndex ) );
std::sort( myList.begin(), myList.end() );
// --> scan for consecutive items with the same node index

A hash_map (or unordered_map) could be tested too, but I would expect
the vector version to be faster (just a guess...).

Ivan

Maybe I'm missing something, but using this way

An "easy" way would be to use:
std::multimap< int/*nodeIndex*/, std::vector<int/*arrayIndex*/> > myList;
// for each index:
myList[aNodeIndex].push_back( anArrayIndex );

will give me the list of lists, but I only want to consider those nodes
which are mentioned multiple times in the storage array. The above will
form me the list of lists, based upon the node indices and for each of
those a list of array indices.

I would then have to search / sort the whole MyList to isolate the
elements in the new MyList that had multiple values stored. Is that correct?

Adam

Karl Heinz Buchegger · Feb 21, 2005

Adam said:
As a result of a graphics based algorihtms, I have a list of indices to
a set of nodes.

I want to efficiently identify any node indices that are stored multiple
times in the array and the location of them in the array /list. Hence
the output being some list of lists, containing groups of indices of the
storage array that point to the same node index.

This is obviously a trivial problem, but if my storage list is large and
the set of nodes large (and hence lots of repeated indices) this
problem could become a bottleneck,

Yep. That definitly will become an issue for large point sets.
What you need to do:
sort the points(*) and keep an eye of where the point was in the
original data structure.
You might want to use a helper data structure for that.

After the sort has been done, all points with identical coordinates
are consecutive and the additional information will tell you where
it was in the original data set.

(*) sorting criterium:
if x_coordinates are equal
if y_coordinates are equal
return z1 < z2
else
return y1 < y2
else
return x1 < x2

Adam Hartshorne · Feb 21, 2005

Karl said:
Yep. That definitly will become an issue for large point sets.
What you need to do:
sort the points(*) and keep an eye of where the point was in the
original data structure.
You might want to use a helper data structure for that.

After the sort has been done, all points with identical coordinates
are consecutive and the additional information will tell you where
it was in the original data set.

(*) sorting criterium:
if x_coordinates are equal
if y_coordinates are equal
return z1 < z2
else
return y1 < y2
else
return x1 < x2

I think you may have misunderstood, there are no actual point
coordinates. Simply a list of points, a list of lines and a list that is
been used to link lines to the points.

What I am concerned with is the linking list. So say the following

I = {10,10,4,6,5,5}

That says lines 1 and 2 are linked to node 10, line 3 to node 4 etc etc

What I want is a result of the search that gives me

O = {10{1,2}, 5{5,6}}

Karl Heinz Buchegger · Feb 21, 2005

Adam said:
I think you may have misunderstood,
Maybe

there are no actual point
coordinates. Simply a list of points, a list of lines and a list that is
been used to link lines to the points.

What I am concerned with is the linking list. So say the following

I = {10,10,4,6,5,5}

That says lines 1 and 2 are linked to node 10, line 3 to node 4 etc etc

What I want is a result of the search that gives me

O = {10{1,2}, 5{5,6}}

Same strategy.
Set up a helper datastructure

struct SortHelper
{
int NodeIndex;
int OriginalPosition;
}

and create an array (or whatever) of that:

I = { 10, 4, 8, 10, 4, 5 }

becomes

{ 10, 1 }
{ 4, 2 }
{ 8, 3 }
{ 10, 4 }
{ 4, 5 }
{ 5, 6 }

Now sort that array according to NodeIndex:

{ 4, 2 }
{ 4, 5 }
{ 5, 6 }
{ 8, 3 }
{ 10, 1 }
{ 10, 4 }

and scan through it: there are 2 consecutive '4' Nodes in the list and they
appeared in the original I at positions 2 and 5. '5' is single and thus
of no interest to you (if I understand correctly), same for '8'. But then
there is 10 which occours 2 times in I at positions 1 and 4.

The strategy is always the same. If you need to compare each element with each
other element in a datastructure, you have a potential O(n^2) algorithm. If
possible (and often it is), sort that thing such that equal elements get
consecutive. Sorting is of order O(n*log(n)), plus an additional O(n) for
running through the data structure and sorting things out. Much better
then O(n^2) for large values of n.

Ivan Vecerina · Feb 21, 2005

Adam Hartshorne said:
Maybe I'm missing something, but using this way

myList;

NB: I actually meant to write std::map said:
// for each index:
myList[aNodeIndex].push_back( anArrayIndex );

Click to expand...

will give me the list of lists, but I only want to consider those nodes
which are mentioned multiple times in the storage array. The above will
form me the list of lists, based upon the node indices and for each of
those a list of array indices.

I would then have to search / sort the whole MyList to isolate the
elements in the new MyList that had multiple values stored.
Is that correct?

Yes: sorting first tends to be the fastest way to find
identical values in a list.

This said, in your case, aNodeIndex values are in a know 0-based
interval. Because of that, you could probably use a faster approach:
// initial map filled with -1 to say no ArrayIndex points to that node
std::vector<int> nodeToFirstInd( maxNodeIndex, -1 );

// this will store only nodes with multiple indices
std::map< int/*nodeIndex*/, std::vector<int/*arrayIndex*/ > > multiLinked;

for(....)// for each arrayIndex, nodeIndex pair:
{
if( nodeToFirstInd[nodeIndex]==-1 )
nodeToFirstInd[nodeIndex] = arrayIndex; // mark node as 'used'
else {
std::vector<int/*arrayIndex*/ >& list = multiLinked[nodeIndex];
if( list.empty() ) // put the initial item in
list.push_back( nodeToFirstInd[nodeIndex] )
list.push_back( arrayIndex ); // add the new value index
}
}

// now multiLinked contains what you want

Sorry the code samples are a mess - just written in a rush.
I hope it is understandable and helpful, though.

Ivan

How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	Apr 30, 2023
How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	Apr 30, 2023
The best way to check if two lists have identical values	4	Feb 25, 2010
Eliminating duplicates entries from a list efficiently	9	Jul 2, 2004
Extracting and inserting a "column" from/into an array	3	Jun 16, 2009
Efficiently concatenating contents of multiple files	5	Jul 2, 2008
Creating a wrapper to two identical shared libraries	2	Mar 14, 2008
How to efficiently get a random set of records from a DB	4	Sep 21, 2008

Efficiently Extracting Identical Values From A List/Array

Adam Hartshorne

Thomas Maier-Komor

Ivan Vecerina

Adam Hartshorne

Karl Heinz Buchegger

Adam Hartshorne

Karl Heinz Buchegger

Ivan Vecerina

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads