Parsing Numeric Data

M

Mike Copeland

The function below (parseNum) seems convoluted and possibly
faulty...although it seems to work. In the code invocation (far below)
the data is real-world, and I wish to parse only the first 6 numeric
values. The number of values to be parsed varies, but there is always a
"termination value" of some alphabetic value or end-of-line. Thus, I
want this logic to act as though it's a variable-value "scanf".
Please advise if there's a "cleaner" way to do this. TIA


typedef vector<string> TOKENS1; // parsing structures
TOKENS1 tokArray;

size_t parseNum(string line) // Parse numeric value(s)
{
string tok1, tok2;
istringstream iss1(line);
tokArray.clear();
while(getline(iss1, tok1, ' '))
{
if(tok1.find(' ') != string::npos)
{
istringstream iss1(tok1);
while(getline(iss1, tok2, ' '))
{
if(!tok2.empty()) tokArray.push_back(tok2);
} // while
} // if
else
{
if(tok1 == "") continue;
if(isdigit(tok1.at(0))) tokArray.push_back(tok1);
else return tokArray.size();
}
} // while
return tokArray.size();
} // parseNum

char m1[] = " 326 500 11 3900 11 3900 stop 10/29/2011 ";
size_t ii = parseNum(m1);
 
C

Chris Gordon-Smith

The function below (parseNum) seems convoluted and possibly
faulty...although it seems to work. In the code invocation (far below)
the data is real-world, and I wish to parse only the first 6 numeric
values. The number of values to be parsed varies, but there is always a
"termination value" of some alphabetic value or end-of-line. Thus, I
want this logic to act as though it's a variable-value "scanf".
Please advise if there's a "cleaner" way to do this. TIA

I had to solve a similar problem a while back. I've included the code I
came up with below. It seems that rather than using getline(), I had an
istream called Input_Stream and a string called Token, and tokenised
records in a loop containing the following construct:
Input_Stream >> Token

The Tokens are pushed onto a list of strings called Input_Record.

It works, although I have no doubt there are many things that could be
done much better.

Cout is a threadsafe wrapper for cout.

The code is available at
http://code.google.com/p/simsoup/source/browse/trunk/simsoup/src/
Persistent_Data_Manager/Input_Record.cpp

I've included an example of the text parsed at the end.

bool Input_Record::Read_Record(istream& Input_Stream, bool& EOF_Flag,
string& Error_Text)

// Read the input record into a list of strings. A record is
// terminated by a semicolon. Comments start with "//" and are
// terminated by end of line

{
TRACE;
string SemiColon(";");
string Token = "";
EOF_Flag = false;
bool End_Of_Record_Flag = false;

while (not End_Of_Record_Flag)
{
if (not (Input_Stream >> Token))
{
EOF_Flag = true;
if (not Token.empty())
{
Error_Text = Error_Text
+ "Incomplete record at end of file - last
token is "
+ String_In_Quotes(Token);
return false;
}
else
{
return true;
}
}

// Echo comment text but otherwise ignore
if ((Token.size() > 1)
and ((Token.substr(0,2) == "//") || Token.substr(0,2) == "/
*"))
{
ostringstream OutStream;
OutStream << Token;
char Text = ' ';
while ((Text not_eq '\n') and (not Input_Stream.eof()))
{
Text = Input_Stream.get();
OutStream << Text;
}
Cout::Get_Pt()->Write(OutStream);
Token.clear();
}
else
{
// Detect end of record
if (Token.substr(Token.size() -1,1) not_eq SemiColon)
{
Input_Record.push_back(Token);
}
else
{
End_Of_Record_Flag = true;
Token.erase(Token.size() -1,1);
if(not Token.empty())
{
Input_Record.push_back(Token);
}
}
}
}
Input_Record_For_Print = Input_Record;
return true;
}

// Bond Types for Designed Atom Types
// ----------------------------------

// Assemblite
Add_BondType @time 2 @Atom1 a @Atom2 a @Order 1 @Enthalpy 1000;
Add_BondType @time 2 @Atom1 a @Atom2 h @Order 1 @Enthalpy 1000;
Add_BondType @time 2 @Atom1 a @Atom2 j @Order 1 @Enthalpy 1000;
Add_BondType @time 2 @Atom1 a @Atom2 l @Order 1 @Enthalpy 1000;
Add_BondType @time 2 @Atom1 a @Atom2 m @Order 1 @Enthalpy 10000;
Add_BondType @time 2 @Atom1 a @Atom2 p @Order 1 @Enthalpy 1000;
Add_BondType @time 2 @Atom1 a @Atom2 s @Order 1 @Enthalpy 1000;
Add_BondType @time 2 @Atom1 a @Atom2 t @Order 1 @Enthalpy 1000;

Chris Gordon-Smith
www.simsoup.info
 
J

Jorgen Grahn

The function below (parseNum) seems convoluted and possibly
faulty...although it seems to work. In the code invocation (far below)
the data is real-world, and I wish to parse only the first 6 numeric
values. The number of values to be parsed varies, but there is always a
"termination value" of some alphabetic value or end-of-line. Thus, I
want this logic to act as though it's a variable-value "scanf".
Please advise if there's a "cleaner" way to do this. TIA


typedef vector<string> TOKENS1; // parsing structures
TOKENS1 tokArray;

size_t parseNum(string line) // Parse numeric value(s)
{

Why not just return a vector or numbers, and why not pass the line as
const reference?

I have a feeling I posted this the other week, but anyway ... this is
untested and probably not correct, but it's not much more complicated
than this. You can fix it up. Don't neglect to read the strtoul()
documentation carefully.

vector<unsigned> parseNum(const string& line)
{
const char* p = line.c_str();
vector<unsigned> acc;

while(1) {
char* end;
unsigned n = strtoul(p, &end, 10);
if(end==p) break;
acc.push_back(n);
if(!*end || !isspace(*end)) break;
p = end;
}

return acc;
}

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,734
Messages
2,569,441
Members
44,832
Latest member
GlennSmall

Latest Threads

Top