Parsing Numeric Data

Discussion in 'C++' started by Mike Copeland, Nov 8, 2012.

  1. The function below (parseNum) seems convoluted and possibly
    faulty...although it seems to work. In the code invocation (far below)
    the data is real-world, and I wish to parse only the first 6 numeric
    values. The number of values to be parsed varies, but there is always a
    "termination value" of some alphabetic value or end-of-line. Thus, I
    want this logic to act as though it's a variable-value "scanf".
    Please advise if there's a "cleaner" way to do this. TIA


    typedef vector<string> TOKENS1; // parsing structures
    TOKENS1 tokArray;

    size_t parseNum(string line) // Parse numeric value(s)
    {
    string tok1, tok2;
    istringstream iss1(line);
    tokArray.clear();
    while(getline(iss1, tok1, ' '))
    {
    if(tok1.find(' ') != string::npos)
    {
    istringstream iss1(tok1);
    while(getline(iss1, tok2, ' '))
    {
    if(!tok2.empty()) tokArray.push_back(tok2);
    } // while
    } // if
    else
    {
    if(tok1 == "") continue;
    if(isdigit(tok1.at(0))) tokArray.push_back(tok1);
    else return tokArray.size();
    }
    } // while
    return tokArray.size();
    } // parseNum

    char m1[] = " 326 500 11 3900 11 3900 stop 10/29/2011 ";
    size_t ii = parseNum(m1);
    Mike Copeland, Nov 8, 2012
    #1
    1. Advertising

  2. On Thu, 08 Nov 2012 08:42:35 -0700, Mike Copeland wrote:

    > The function below (parseNum) seems convoluted and possibly
    > faulty...although it seems to work. In the code invocation (far below)
    > the data is real-world, and I wish to parse only the first 6 numeric
    > values. The number of values to be parsed varies, but there is always a
    > "termination value" of some alphabetic value or end-of-line. Thus, I
    > want this logic to act as though it's a variable-value "scanf".
    > Please advise if there's a "cleaner" way to do this. TIA
    >
    >


    I had to solve a similar problem a while back. I've included the code I
    came up with below. It seems that rather than using getline(), I had an
    istream called Input_Stream and a string called Token, and tokenised
    records in a loop containing the following construct:
    Input_Stream >> Token

    The Tokens are pushed onto a list of strings called Input_Record.

    It works, although I have no doubt there are many things that could be
    done much better.

    Cout is a threadsafe wrapper for cout.

    The code is available at
    http://code.google.com/p/simsoup/source/browse/trunk/simsoup/src/
    Persistent_Data_Manager/Input_Record.cpp

    I've included an example of the text parsed at the end.

    bool Input_Record::Read_Record(istream& Input_Stream, bool& EOF_Flag,
    string& Error_Text)

    // Read the input record into a list of strings. A record is
    // terminated by a semicolon. Comments start with "//" and are
    // terminated by end of line

    {
    TRACE;
    string SemiColon(";");
    string Token = "";
    EOF_Flag = false;
    bool End_Of_Record_Flag = false;

    while (not End_Of_Record_Flag)
    {
    if (not (Input_Stream >> Token))
    {
    EOF_Flag = true;
    if (not Token.empty())
    {
    Error_Text = Error_Text
    + "Incomplete record at end of file - last
    token is "
    + String_In_Quotes(Token);
    return false;
    }
    else
    {
    return true;
    }
    }

    // Echo comment text but otherwise ignore
    if ((Token.size() > 1)
    and ((Token.substr(0,2) == "//") || Token.substr(0,2) == "/
    *"))
    {
    ostringstream OutStream;
    OutStream << Token;
    char Text = ' ';
    while ((Text not_eq '\n') and (not Input_Stream.eof()))
    {
    Text = Input_Stream.get();
    OutStream << Text;
    }
    Cout::Get_Pt()->Write(OutStream);
    Token.clear();
    }
    else
    {
    // Detect end of record
    if (Token.substr(Token.size() -1,1) not_eq SemiColon)
    {
    Input_Record.push_back(Token);
    }
    else
    {
    End_Of_Record_Flag = true;
    Token.erase(Token.size() -1,1);
    if(not Token.empty())
    {
    Input_Record.push_back(Token);
    }
    }
    }
    }
    Input_Record_For_Print = Input_Record;
    return true;
    }

    // Bond Types for Designed Atom Types
    // ----------------------------------

    // Assemblite
    Add_BondType @Time 2 @Atom1 a @Atom2 a @Order 1 @Enthalpy 1000;
    Add_BondType @Time 2 @Atom1 a @Atom2 h @Order 1 @Enthalpy 1000;
    Add_BondType @Time 2 @Atom1 a @Atom2 j @Order 1 @Enthalpy 1000;
    Add_BondType @Time 2 @Atom1 a @Atom2 l @Order 1 @Enthalpy 1000;
    Add_BondType @Time 2 @Atom1 a @Atom2 m @Order 1 @Enthalpy 10000;
    Add_BondType @Time 2 @Atom1 a @Atom2 p @Order 1 @Enthalpy 1000;
    Add_BondType @Time 2 @Atom1 a @Atom2 s @Order 1 @Enthalpy 1000;
    Add_BondType @Time 2 @Atom1 a @Atom2 t @Order 1 @Enthalpy 1000;

    Chris Gordon-Smith
    www.simsoup.info
    Chris Gordon-Smith, Nov 8, 2012
    #2
    1. Advertising

  3. Mike Copeland

    Jorgen Grahn Guest

    On Thu, 2012-11-08, Mike Copeland wrote:
    > The function below (parseNum) seems convoluted and possibly
    > faulty...although it seems to work. In the code invocation (far below)
    > the data is real-world, and I wish to parse only the first 6 numeric
    > values. The number of values to be parsed varies, but there is always a
    > "termination value" of some alphabetic value or end-of-line. Thus, I
    > want this logic to act as though it's a variable-value "scanf".
    > Please advise if there's a "cleaner" way to do this. TIA
    >
    >
    > typedef vector<string> TOKENS1; // parsing structures
    > TOKENS1 tokArray;
    >
    > size_t parseNum(string line) // Parse numeric value(s)
    > {


    Why not just return a vector or numbers, and why not pass the line as
    const reference?

    I have a feeling I posted this the other week, but anyway ... this is
    untested and probably not correct, but it's not much more complicated
    than this. You can fix it up. Don't neglect to read the strtoul()
    documentation carefully.

    vector<unsigned> parseNum(const string& line)
    {
    const char* p = line.c_str();
    vector<unsigned> acc;

    while(1) {
    char* end;
    unsigned n = strtoul(p, &end, 10);
    if(end==p) break;
    acc.push_back(n);
    if(!*end || !isspace(*end)) break;
    p = end;
    }

    return acc;
    }

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Nov 12, 2012
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. luna
    Replies:
    1
    Views:
    13,811
    Kevin Spencer
    Feb 6, 2004
  2. Replies:
    5
    Views:
    912
    X-Centric
    Jun 30, 2005
  3. darrel
    Replies:
    4
    Views:
    793
    darrel
    Jul 19, 2007
  4. jobs

    int to numeric numeric(18,2) ?

    jobs, Jul 21, 2007, in forum: ASP .Net
    Replies:
    2
    Views:
    936
    =?ISO-8859-1?Q?G=F6ran_Andersson?=
    Jul 22, 2007
  5. Mike Copeland

    Parsing Numeric Data

    Mike Copeland, Oct 16, 2012, in forum: C++
    Replies:
    5
    Views:
    354
    Jorgen Grahn
    Dec 7, 2012
Loading...

Share This Page