A
Alex Gerdemann
Hello,
I have spent a bunch of time converting a Java program I wrote to C++ in
order to improve performance, and have found that it is not necessarily
faster.
Specifically, I'm writing a circuit simulator, so I must first parse the
netlist file. The program goes through an input file, and makes a hash_set
of unique nodes (std::string's). The words are then sorted and numbered by
copying the hash_set into a vector and sorting. Then, I run a series of
binary searches on the vector to allocate lines in a matrix for each
element.
I've modified STL's lower_bound() to my getIndex() which functions like
Java's equivalent by returning negative indices when the element is not
found.
Also, Java provides a HashSet class that can hash any kind of element it may
contain. Since, the STL only hashes selected items I had to convert my
std::strings to const char*'s for hashing.
Other than that, my code in the Java and C++ is quite similar. However, the
parse of a 100k line file took around a second for Java, but more like 17s
for C++ . In fact, using std::set instead of hash_set actually improved
performance to about 15s in C++.
Any idea why there would be such a difference? In particular, I suspect
that my hash function may be slow from the call to c_str(). Does this have
to allocate new memory or just point to somewhere inside the object? I'm
kind of curious what Java actually uses for its hash. Any general
performace/other criticisms on my scheme are certainly welcome.
On the plus side there is a great direct sparse linear solver for C. I've
only been able to find a iterative sparse solver for Java. This makes the
C++ code overall much faster than Java, but I suspect that the parse stage
should run at least as fast or faster.
Thanks,
Alex Gerdemann
University of Illinois at Urbana-Champaign
For reference, I've included a sketch of the code
So I did something like the following:
struct hashSet {
__gnu_css::hash<const char*> h;
bool operator()(std::string& s) const {
return h(s.ctr());
}
};
inline int getIndex(const std::vector<std::string>& list, const std::string&
item) {
std::vector<std::string>::const_iterator i =
std::lower_bound(list.begin(), list.end(), item);
if(i==list.end()) {
return -(static_cast<int>(i-list.begin())+1);
} else if(*i != item) {
return -(static_cast<int>(i-list.begin())+1);
}
return static_cast<int>(i-list.begin());
}
__gnu_cxx::hash_set<std::string, hashString> nodeSet;
std::vector<std::string> nodeVector;
while(!done) {
//read a line of file
//split the line into words and push into a vector
//check for syntax errors
nextNode = some word in line;
nodeSet.insert(nextNode);
}
std::copy(nodeSet.begin(),node.end(),std::back_inserter(nodeVector));
std::sort(nodeVector.begin(),nodeVector.end());
while(!done) {
//read line from previously stored words
//more error checking
thisNode = some word in line
getIndex(nodeVector, thisNode);
//allocate an object representing element containing proper indices
representing nodes
//push pointer to element into a vector
}
I have spent a bunch of time converting a Java program I wrote to C++ in
order to improve performance, and have found that it is not necessarily
faster.
Specifically, I'm writing a circuit simulator, so I must first parse the
netlist file. The program goes through an input file, and makes a hash_set
of unique nodes (std::string's). The words are then sorted and numbered by
copying the hash_set into a vector and sorting. Then, I run a series of
binary searches on the vector to allocate lines in a matrix for each
element.
I've modified STL's lower_bound() to my getIndex() which functions like
Java's equivalent by returning negative indices when the element is not
found.
Also, Java provides a HashSet class that can hash any kind of element it may
contain. Since, the STL only hashes selected items I had to convert my
std::strings to const char*'s for hashing.
Other than that, my code in the Java and C++ is quite similar. However, the
parse of a 100k line file took around a second for Java, but more like 17s
for C++ . In fact, using std::set instead of hash_set actually improved
performance to about 15s in C++.
Any idea why there would be such a difference? In particular, I suspect
that my hash function may be slow from the call to c_str(). Does this have
to allocate new memory or just point to somewhere inside the object? I'm
kind of curious what Java actually uses for its hash. Any general
performace/other criticisms on my scheme are certainly welcome.
On the plus side there is a great direct sparse linear solver for C. I've
only been able to find a iterative sparse solver for Java. This makes the
C++ code overall much faster than Java, but I suspect that the parse stage
should run at least as fast or faster.
Thanks,
Alex Gerdemann
University of Illinois at Urbana-Champaign
For reference, I've included a sketch of the code
So I did something like the following:
struct hashSet {
__gnu_css::hash<const char*> h;
bool operator()(std::string& s) const {
return h(s.ctr());
}
};
inline int getIndex(const std::vector<std::string>& list, const std::string&
item) {
std::vector<std::string>::const_iterator i =
std::lower_bound(list.begin(), list.end(), item);
if(i==list.end()) {
return -(static_cast<int>(i-list.begin())+1);
} else if(*i != item) {
return -(static_cast<int>(i-list.begin())+1);
}
return static_cast<int>(i-list.begin());
}
__gnu_cxx::hash_set<std::string, hashString> nodeSet;
std::vector<std::string> nodeVector;
while(!done) {
//read a line of file
//split the line into words and push into a vector
//check for syntax errors
nextNode = some word in line;
nodeSet.insert(nextNode);
}
std::copy(nodeSet.begin(),node.end(),std::back_inserter(nodeVector));
std::sort(nodeVector.begin(),nodeVector.end());
while(!done) {
//read line from previously stored words
//more error checking
thisNode = some word in line
getIndex(nodeVector, thisNode);
//allocate an object representing element containing proper indices
representing nodes
//push pointer to element into a vector
}