Need help using STL to sort a vector of strings

A

Al Newton

I want to use STL's sort algorithm to sort a string vector. Some of
the strings are fairly long (300 to 400 chars) and the vector isn't
small (5,000 to 10,000 elements). Naturally, sorting time is long.
All I need sorted are the first 10 characters in the strings.
Limiting the number of characters processed should speed things up
considerably. Is there a way to do 1t?

Thanks for any help ... Al
 
F

Fraser Ross

bool pred(const std::string& lhs, const std::string& rhs)
{
return lhs.substr(0, sig_chars) < rhs.substr(0, sig_chars);
}
bool pred(const std::string& lhs, const std::string& rhs)
{
return std::lexicographical_compare(lhs.begin(), lhs.size()>10 ?
lhs.begin()+10 : lhs.end(), rhs.begin(), rhs.size()>10 ? rhs.begin()+10 :
rhs.end());
}

This would be considerably faster with not having to create substrings.

Fraser.
 
A

Al Newton

Victor Bazarov said:
Do you need to sort every string in a vector and then the vector
as well, or just every string, or just the vector based on the
first 10 chars in every string?

Example: given the vector

"sadflkjasdlksjdf"
"sadfslkjlskjdlfkj"
"lkjdfasdflkas"

sorting the first 10 chars in each string gives

"aaddfjklsslksjdf"
"adfjkllssskjdlfkj"
"addffjkllskas"

sorting the first 10 chars, then the entire vector gives


"aaddfjklsslksjdf"
"addffjkllskas"
"adfjkllssskjdlfkj"

sorting the entire vector using only 10 chars of each line gives

"lkjdfasdflkas"
"sadflkjasdlksjdf"
"sadfslkjlskjdlfkj"

So, make up your mind.


Of course. If you've only specified what exactly it is you want
to do...

Victor


Sorry about the ambiguity. Given as input the 3-element vector:
987654321A
123456789B
012345678C
I want the sorted output to be:
012345678C
123456789B
987654321A

Hope this clears up things ... Al
 
A

Andrew Koenig

Al> I apologize for ambiguities in the posting. I hope these were
Al> resolved in my reply to Bazarov. All strings begin with 10
Al> numerics that typically repeat 3 to 8 times in a vector of 10,000
Al> strings, but many numerics are unique within the vector. In all
Al> cases the numerics are followed by alphabetic data of no
Al> consequence. I don't have a quantitative answer as to run time,
Al> but the frown on my manager's face says, "Too long. Way, way too
Al> long."

If the numerics don't repeat much, then you're not going to save much
by restricting the comparisons. In fact, depending on how you do it,
you might even make things slower.

10,000 strings isn't that many on a modern processor. If it's taking
more than a small fraction of a second to sort them, I would be
inclined to look somewhere other than the comparison function for the
cause.
 
A

Al Newton

Mike Wahler said:
The sort function offers an optional argument, a
'predicate' function you can write to specify the
ordering:

<snip>

Thank you for the response. It was valuable and sincerely appreciated.

.... Al
 
A

Al Newton

Fraser Ross said:
bool pred(const std::string& lhs, const std::string& rhs)
{
return std::lexicographical_compare(lhs.begin(), lhs.size()>10 ?
lhs.begin()+10 : lhs.end(), rhs.begin(), rhs.size()>10 ? rhs.begin()+10 :
rhs.end());
}

This would be considerably faster with not having to create substrings.

Fraser.

You're quite right. 50000 repetitions of the original sort required
2664ms. 50000 repetitions of your modification reduced that to a mere
581ms.

Sincere thanks ... Al
 
J

Jerry Coffin

alnewton1 said:
I want to use STL's sort algorithm to sort a string vector. Some of
the strings are fairly long (300 to 400 chars) and the vector isn't
small (5,000 to 10,000 elements). Naturally, sorting time is long.
All I need sorted are the first 10 characters in the strings.
Limiting the number of characters processed should speed things up
considerably. Is there a way to do 1t?

I'm not sure about the "naturally sorting time is long" part -- sorting
10,000 strings should not be very slow. Though you'd need to use a
profiler to know, there's also a chance that you're actually copying
entire strings instead of just copying pointers when (for example) the
sort function decides it needs to swap a pair of strings. In this case,
you might gain considerably more by using pointers rather than raw
strings.

To answer your question directly, however, when you call std::sort you
can pass it a comparison routine you want used. Your comparison
function could simply create 10-character sub-strings and compare them,
or it could call strncmp to do the comparison, or (of course) it could
do the comparison itself. Which of these would be fastest is likely to
depend somewhat on how std::string is implemented on your machine.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,044
Latest member
RonaldNen

Latest Threads

Top