String tokens/parsing

  • Thread starter Christopher Benson-Manica
  • Start date
C

Christopher Benson-Manica

(if this is a FAQ, I apologize for not finding it)

I have a C-style string that I'd like to cleanly separate into tokens
(based on the '.' character) and then convert those tokens to unsigned
integers. What is the best standard(!) C++ way to accomplish this?
 
T

Thomas Matthews

Christopher said:
(if this is a FAQ, I apologize for not finding it)

I have a C-style string that I'd like to cleanly separate into tokens
(based on the '.' character) and then convert those tokens to unsigned
integers. What is the best standard(!) C++ way to accomplish this?

The strtok function will find tokens in the string and
modify your string.

Perhaps strchr to find the '.'.

Another function is sscanf. I've heard that you can set
the format descriptor string so that it parses correctly.
{which may be a difficult task). I'm sure if you post
to Dan Pop will show the way.

As for C++, you may want to convert to a std::string
and use the "find" methods and maybe a stringstream
for converting to an int.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
 
C

Christopher Benson-Manica

Thomas Matthews said:
The strtok function will find tokens in the string and
modify your string.
Perhaps strchr to find the '.'
Another function is sscanf. I've heard that you can set
the format descriptor string so that it parses correctly.
{which may be a difficult task). I'm sure if you post
to news:comp.lang.c, Dan Pop will show the way.

Believe me, I'm perfectly capable of doing this with C, and am no
stranger to comp.lang.c. I posted here specifically because I'm
interested in improving on the C methods, if that is in fact possible.
The original C code (we're stuck in a "C-style-C++" paradigm,
unfortunately) strikes me as being distinctively hack-y.
 
D

Default User

Christopher said:
Believe me, I'm perfectly capable of doing this with C, and am no
stranger to comp.lang.c. I posted here specifically because I'm
interested in improving on the C methods, if that is in fact possible.
The original C code (we're stuck in a "C-style-C++" paradigm,
unfortunately) strikes me as being distinctively hack-y.


Then you need to be more specific about what you already have, and what
your requirements are. Do you object to converting the C-style strings
to std::strings or std::stringstreams? What limitations does your
"C-style-C++" paradigm impose? To what extent can you deviate from the C
standard library?

You have a very vague question.




Brian Rodenborn
 
C

Christopher Benson-Manica

Default User said:
Then you need to be more specific about what you already have,

The original code looked like the following unholy mess (which I did
not write):

// unsigned int typedef'ed as uint
// assume appropriate #includes

char sTemp[64];
uint uDeptTime, uArrvTime;
if( argc>=2 ) {
if( sameas(argv[1], "") ) { // sameas ~ strcmp() with flavor
// error
}
strncpy( sTemp, argv[1], sizeof(sTemp) );
cp=strchr(sTemp, '.');
if( cp == NULL ) {
// error
}
uint const lene=strlen(cp);
uint const lenb=strlen(sTemp);
uint const lenr=lenb-lene;
sTemp[lenr]='\0';
uDeptTime=(uint)atoi(sTemp);
if( cp+1 ) {
uArrvTime=(uint)atoi(cp+1);
}
}

I wrote the following as a first approximation to a decent solution:

char *cp;
vector<uint> v;
char sTemp[64];
uint uDeptTime, uArrvTime, uMaxGT; // new variable
for( cp=argv[1] ; (cp=strchr(cp,'.')) != NULL ; ) {
v.push_back( atoi(cp++) ); // atoi() wraps the "standard" atoi, but
// I don't know the details
}
if( v.size() < 3 ) {
// error
}
uDeptTime=v[0];
uArrvTime=v[1];
uMaxGT=v[2];
and what your requirements are.

Straightline code only. (no additional class declarations)
Do you object to converting the C-style strings to std::strings or
std::stringstreams?

I'd love to use std::strings and/or std::stringstreams if they offer a
cleaner (not necessarily more "efficient") solution.
What limitations does your "C-style-C++" paradigm impose?

The STL is never used in our code, and vectors in particular seem to
be frowned upon. std::stringstreams might be pushing the envelope.
You have a very vague question.

Better?
 
E

Evan Carew

Chris,
[snip]
<sigh>

Sometimes those responding to messages in this group can be a little...
well... pedantic. If you are looking for a decent C++ implementation of
a tokenizer, have a look at the boost.org site where they have a decent
tokenizer.

The Boost web site contains many useful template libraries that
complement the STL. As it turns out, many of the STL authors contribute
to this site. The way they put it, many of their submissions didn't make
it into the standard, but are none-the-less useful and worthy of use.
 
C

Christopher Benson-Manica

Evan Carew said:
Sometimes those responding to messages in this group can be a little...
well... pedantic.

And I wouldn't want it any other way :)
If you are looking for a decent C++ implementation of
a tokenizer, have a look at the boost.org site where they have a decent
tokenizer.

Unfortunately, boost is out of the question here. I'm working at a
company where any code not written in-house (i.e., by my boss) is
considered suspect, so in effect I'm trying to sneak some "real" C++
in the code here and there below the radar. There are times where
std::strings can really make life simple, so I toss them in
occasionally, but for the most part C-style strings rule the day.
 
C

Christopher Benson-Manica

Christopher Benson-Manica said:
I wrote the following as a first approximation to a decent solution:

And then (now) realized that it doesn't work...
for( cp=argv[1] ; (cp=strchr(cp,'.')) != NULL ; ) {
v.push_back( atoi(cp++) ); // atoi() wraps the "standard" atoi, but
// I don't know the details
}

*sigh*

for( cp=argv[1] ; cp && cp++ ; cp=strchr(cp,'.') ) {
v.push_back( atoi(cp) );
}
 
D

Default User

Evan said:
Sometimes those responding to messages in this group can be a little...
well... pedantic.

I'm not sure if my questions were pedantic. Had he presented the problem
cleanly, then one could try to answer. As he had some not well-define
limits, I thought it prudent to ask before presenting solutions that may
not suit him. For instance, in light of his followup, something like my
Explode() function I trot out now and them wouldn't do, because it
returns a vector or strings.
If you are looking for a decent C++ implementation of
a tokenizer, have a look at the boost.org site where they have a decent
tokenizer.

Considering that he specified a "C-style C++ paradigm" I doubt Boost
will be in his solution set. Which is exactly why I asked.



Brian Rodenborn
 
D

Default User

Christopher said:
Default User said:
Then you need to be more specific about what you already have,

The original code looked like the following unholy mess (which I did
not write):

// unsigned int typedef'ed as uint
// assume appropriate #includes

char sTemp[64];
uint uDeptTime, uArrvTime;
if( argc>=2 ) {
if( sameas(argv[1], "") ) { // sameas ~ strcmp() with flavor
// error
}
strncpy( sTemp, argv[1], sizeof(sTemp) );
cp=strchr(sTemp, '.');
if( cp == NULL ) {
// error
}
uint const lene=strlen(cp);
uint const lenb=strlen(sTemp);
uint const lenr=lenb-lene;
sTemp[lenr]='\0';
uDeptTime=(uint)atoi(sTemp);
if( cp+1 ) {
uArrvTime=(uint)atoi(cp+1);
}
}

I wrote the following as a first approximation to a decent solution:

char *cp;
vector<uint> v;
char sTemp[64];
uint uDeptTime, uArrvTime, uMaxGT; // new variable
for( cp=argv[1] ; (cp=strchr(cp,'.')) != NULL ; ) {
v.push_back( atoi(cp++) ); // atoi() wraps the "standard" atoi, but
// I don't know the details
}
if( v.size() < 3 ) {
// error
}
uDeptTime=v[0];
uArrvTime=v[1];
uMaxGT=v[2];
and what your requirements are.

Straightline code only. (no additional class declarations)
Do you object to converting the C-style strings to std::strings or
std::stringstreams?

I'd love to use std::strings and/or std::stringstreams if they offer a
cleaner (not necessarily more "efficient") solution.

Ok, Donovan Rebbechi previously posted this:

The simplest way would be to use the getline() function and set the
optionl
field separator argument to "."

std::istringstream in(mystring);

while (std::getline(in, mystring, '.'))
{
stringlist.push_back(mystring);
};
The STL is never used in our code, and vectors in particular seem to
be frowned upon. std::stringstreams might be pushing the envelope.

My usual tool for this sort of thing is the Explode function I wrote for
string parsing. Unfortunately, it returns a vector of strings. I'll
present it anyway, you may be able to get some value from it. Or not.

#include <vector>
#include <string>

// breaks apart a string into substrings separated by a character string
// does not use a strtok() style list of separator characters
// returns a vector of std::strings

std::vector<std::string> Explode (const std::string &inString,
const std::string &separator)
{
std::vector<std::string> returnVector;
std::string::size_type start = 0;
std::string::size_type end = 0;

while ((end=inString.find (separator, start)) != std::string::npos)
{
returnVector.push_back (inString.substr (start, end-start));
start = end+separator.size();
}

returnVector.push_back (inString.substr (start));

return returnVector;
}

Much.



Brian Rodenborn
 
D

David Rubin

Christopher said:
(if this is a FAQ, I apologize for not finding it)

I have a C-style string that I'd like to cleanly separate into tokens
(based on the '.' character) and then convert those tokens to unsigned
integers. What is the best standard(!) C++ way to accomplish this?

Here is some code I wrote for this following a similar discussion:

http://groups.google.com/groups?hl=en&th=e6865edab29b40ba&[email protected]

using namespace std;

template <typename InsertIter>
void
tokenize(const string& buf, const string& delim, InsertIter& ii)
{
string::size_type sp(0); /* start position */
string::size_type ep(-1); /* end position */

do{
sp = buf.find_first_not_of(delim, ep+1);
ep = buf.find_first_of(delim, sp);
if(sp != ep){
if(ep == buf.npos)
ep = buf.length();
*ii++ = buf.substr(sp, ep-sp);
}
}while(sp != buf.npos);
}

You fill the delim string and then do your I/O in a loop similar to the
following:

deque<string> tokens;
while(std::getline(cin, buf) && !cin.fail()){
insert_iterator<deque<string> > ii(tokens, tokens.begin());
tokenize(buf, delim, ii);
if(tokens.size() > 0){
copy(tokens.begin(), tokens.end(),
ostream_iterator<string>(cout, "\n"));
tokens.clear();
}
}

The referenced discussion also indicates a relatively neater method
using locale and istringstream, but I don't have the implementation
handy.

HTH,

/david
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,571
Members
45,045
Latest member
DRCM

Latest Threads

Top