My Explode function(s) are too slow.

F

FFMG

Hi,

I need the php equivalent of explode in one of my app.
I read a very big file and "explode" each line to fill a structure.

The reading of the file, 19Mb, (I will also need to streamline the way
I read each line I guess), takes about 10 seconds. But when I 'explode'
each line the process takes about 140 seconds.
This is what I have tried so far...

//-----------------------function A--------------------------
std::vector<std::string> explode(
const std::string s,
const std::string separator
)
{
const int iPit = separator.length();
std::vector<std::string> ret;
int iPos = s.find(separator, 0);
int iStart = 0;

while(iPos>-1)
{
if(iPos!=0){
ret.push_back(s.substr(iStart,iPos-iStart));
iStart = (iPos+iPit);
}
iPos = s.find(separator, iStart);
} // end while

// add the last item if need be.
if(iStart != s.length()){
ret.push_back(s.substr(iStart));
}
return ret;
}

//-----------------------function B--------------------------
std::vector<std::string> explode(
const char* s,
const char separator
)
{
std::vector<std::string> ret;

char seps[] = {separator};
char *token = strtok( (char*)s, seps );
while( token != NULL )
{
ret.push_back( token );
token = strtok( NULL, seps );
}
return ret;
}
//--------------------------------------------------------------------

Function B is slightly faster than function A.

How could I speed up my Explode?

Many thanks

FFMG
 
A

Alf P. Steinbach

* FFMG:
Hi,

I need the php equivalent of explode in one of my app.
I read a very big file and "explode" each line to fill a structure.

The reading of the file, 19Mb, (I will also need to streamline the way
I read each line I guess), takes about 10 seconds. But when I 'explode'
each line the process takes about 140 seconds.
This is what I have tried so far...

//-----------------------function A--------------------------
std::vector<std::string> explode(
const std::string s,
const std::string separator
)
{
const int iPit = separator.length();
std::vector<std::string> ret;
int iPos = s.find(separator, 0);
int iStart = 0;

while(iPos>-1)
{
if(iPos!=0){
ret.push_back(s.substr(iStart,iPos-iStart));
iStart = (iPos+iPit);
}
iPos = s.find(separator, iStart);
} // end while

// add the last item if need be.
if(iStart != s.length()){
ret.push_back(s.substr(iStart));
}
return ret;
}

//-----------------------function B--------------------------
std::vector<std::string> explode(
const char* s,
const char separator
)
{
std::vector<std::string> ret;

char seps[] = {separator};
char *token = strtok( (char*)s, seps );
while( token != NULL )
{
ret.push_back( token );
token = strtok( NULL, seps );
}
return ret;
}
//--------------------------------------------------------------------

Function B is slightly faster than function A.

How could I speed up my Explode?

I'd first try to

* Read the complete file into a buffer in one or a very few large
gulps -- that typically improves the reading by at least one
order of magnitude.

* Analyze whether an /explicit representation/ of the complete token
set is really required, or whether you can just proceed by handing
one at a time up to calling code or down to code that you call.

* If explicit representation is required, and performance really
suffered, I'd first try the obvious of checking whether compiler
options could fix the performance; second whether a rewrite to a
"get" function (not returning the result via function result but
via a reference argument) would fix it; third, I'd consider things
such as a vector of StringSpan objects, each such object containing
just a pointer to the start and end of a substring.
 
R

Rolf Magnus

FFMG said:
Hi,

I need the php equivalent of explode in one of my app.
I read a very big file and "explode" each line to fill a structure.

You are aware that not everyone knows PHP? What does "explode each line"
mean?
The reading of the file, 19Mb, (I will also need to streamline the way
I read each line I guess), takes about 10 seconds. But when I 'explode'
each line the process takes about 140 seconds.
This is what I have tried so far...

//-----------------------function A--------------------------
std::vector<std::string> explode(
const std::string s,
const std::string separator

You should pass those strings by reference.
)
{
const int iPit = separator.length();
std::vector<std::string> ret;
int iPos = s.find(separator, 0);
int iStart = 0;

The return type of std::string::find and std::string::length is
std::string::size_type, not int.
while(iPos>-1)

find() returns std::string::npos if nothing is found.
{
if(iPos!=0){
ret.push_back(s.substr(iStart,iPos-iStart));
iStart = (iPos+iPit);
}
iPos = s.find(separator, iStart);
} // end while

// add the last item if need be.
if(iStart != s.length()){
ret.push_back(s.substr(iStart));
}
return ret;
}



//-----------------------function B--------------------------
std::vector<std::string> explode(
const char* s,
const char separator
)
{
std::vector<std::string> ret;

char seps[] = {separator};
char *token = strtok( (char*)s, seps );
while( token != NULL )
{
ret.push_back( token );
token = strtok( NULL, seps );
}
return ret;
}
//--------------------------------------------------------------------

Function B is slightly faster than function A.

The second one just uses a single char as separator, while the first one
uses a whole string. If a single char is enough, you could implement A as:

std::vector<std::string> explode(const std::string& s, const char separator)
{
std::vector<std::string> ret;
std::stringstream stream(s);
std::string element;
while (std::getline(stream, element, separator))
ret.push_back(element);
return ret;
}
 
V

Vaclav Haisman

FFMG wrote, On 8.6.2006 10:57:
Hi,

I need the php equivalent of explode in one of my app.
I read a very big file and "explode" each line to fill a structure.

The reading of the file, 19Mb, (I will also need to streamline the way
I read each line I guess), takes about 10 seconds. But when I 'explode'
each line the process takes about 140 seconds.
This is what I have tried so far...

//-----------------------function A--------------------------
std::vector<std::string> explode(
const std::string s,
const std::string separator
Pass the two arguments by reference instead of by value.
)
{
const int iPit = separator.length();
std::vector<std::string> ret;
int iPos = s.find(separator, 0);
Better use std::string::size_type.
int iStart = 0;

while(iPos>-1)
Don't use -1, better use std::string:npos
{
if(iPos!=0){
ret.push_back(s.substr(iStart,iPos-iStart));
iStart = (iPos+iPit);
}
iPos = s.find(separator, iStart);
} // end while

// add the last item if need be.
if(iStart != s.length()){
ret.push_back(s.substr(iStart));
}
return ret;
}

//-----------------------function B--------------------------
std::vector<std::string> explode(
const char* s,
const char separator
)
{
std::vector<std::string> ret;

char seps[] = {separator};
char *token = strtok( (char*)s, seps );
while( token != NULL )
{
ret.push_back( token );
token = strtok( NULL, seps );
}
return ret;
}
//--------------------------------------------------------------------

Function B is slightly faster than function A.
Probably because it does not copy whole string s on each call.
 
T

Tom Widmer

FFMG said:
Hi,

I need the php equivalent of explode in one of my app.
I read a very big file and "explode" each line to fill a structure.

The reading of the file, 19Mb, (I will also need to streamline the way
I read each line I guess), takes about 10 seconds. But when I 'explode'
each line the process takes about 140 seconds.

The problem is memory allocation overhead.
How could I speed up my Explode?

Something like:

void explode(
std::vector<char const*>& ret,
std::string& s,
char const separator
)
{
ret.reserve(s.size() / 10u);
std::string::size_type const iPit = 1;
std::string::size_type iPos = s.find(separator, 0);
std::string::size_type iStart = 0;

while(iPos != std::string::npos)
{
if(iPos!=0){
s[iPos] = '\0'; //null it out
ret.push_back(&s[iStart]);
iStart = (iPos+iPit);
}
iPos = s.find(separator, iStart);
} // end while

// add the last item if need be.
if(iStart != s.length()){
ret.push_back(&ret.first[iStart]);
}
}

ret will only be valid for as long as the passed s is not modified, and
note that s is modified by the call.

Tom
 
R

Roland Pibinger

I need the php equivalent of explode in one of my app.
I read a very big file and "explode" each line to fill a structure.

I guess 'explode' means tokenize.
The reading of the file, 19Mb, (I will also need to streamline the way
I read each line I guess), takes about 10 seconds. But when I 'explode'
each line the process takes about 140 seconds.
This is what I have tried so far...

Others have already pointed out that you copy objects unnecessarily
instead of using references (only old-fashioned 'modern' C++
programmers copy everything by value). One performance inhibitor for
large data is also std::vector. You must reserve (with
vector.reserve()) enough space for the vector (estimate!) otherwise it
reallocates often and performs many unnecessary copies.
Some tokenize implementation are discussed in:
http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/aa4daafacd01ce26
, see esp. John Potter's solution and the following.

Best wishes,
Roland Pibinger
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

TF-IDF 1
Is this a functor? 2
Can't solve problems! please Help 0
Lexical Analysis on C++ 1
change function 5
using my template class 11
slow complex<double>'s 9
Help in STRING compare ? 3

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,130
Latest member
MitchellTe
Top