string splitting

Discussion in 'C++' started by xyz, Apr 29, 2008.

  1. xyz

    xyz Guest

    I have a string
    16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168

    for example lets say for the above string
    16:23:18.659343 -- time
    131.188.37.230 -- srcaddress
    22 --srcport
    131.188.37.59 --destaddress
    1398 --destport
    tcp --protocol
    168 --size
    i need to split the string such that i need to get all these
    parameters....
    the field widths are not fixed..i have some times four/three digits
    srcport ..so i cant do it with substr function...i need this in c++
    i am not getting an idea how to split it..
    thank you for any help
    xyz, Apr 29, 2008
    #1
    1. Advertising

  2. xyz

    Lars Uffmann Guest

    Google for c++ explode, you'll find a lot of infos on how to do it.
    Lars Uffmann, Apr 29, 2008
    #2
    1. Advertising

  3. xyz

    Guest

    , Apr 29, 2008
    #3
  4. xyz

    Jim Langston Guest

    --
    Jim Langston

    "xyz" <> wrote in message
    news:...
    >I have a string
    > 16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168
    >
    > for example lets say for the above string
    > 16:23:18.659343 -- time
    > 131.188.37.230 -- srcaddress
    > 22 --srcport
    > 131.188.37.59 --destaddress
    > 1398 --destport
    > tcp --protocol
    > 168 --size
    > i need to split the string such that i need to get all these
    > parameters....
    > the field widths are not fixed..i have some times four/three digits
    > srcport ..so i cant do it with substr function...i need this in c++
    > i am not getting an idea how to split it..
    > thank you for any help


    Not complete but giving you all the pieces.

    You should use your favorite method for converting from strings to ints,
    I'im showing a manual stringstream way, but I use a template myself.

    Output is:

    16:23:18.659343 -- time
    131.188.37.230.22 -- srcaddress/port
    131.188.37.59.1398 -- destaddress/port
    tcp -- protocol
    168 -- size

    131.188.37.230 : 22

    #include <string>
    #include <sstream>
    #include <iostream>

    int main()
    {
    std::string Input( "16:23:18.659343 131.188.37.230.22 131.188.37.59.1398
    tcp 168" );
    std::stringstream Stream( Input );

    std::string Time;
    std::string SrcAddressPort;
    std::string DestAddressPort;
    std::string Protocol;
    int Size;

    if ( Stream >> Time >> SrcAddressPort >> DestAddressPort >> Protocol >>
    Size )
    {
    std::cout << Time << " -- time\n" <<
    SrcAddressPort << " -- srcaddress/port\n" <<
    DestAddressPort << " -- destaddress/port\n" <<
    Protocol << " -- protocol\n" <<
    Size << " -- size\n\n";
    }
    else
    std::cerr << "Parsing error\n";

    std::string SrcAddress;
    std::string PortString;
    int SrcPort = 0;

    SrcAddress = SrcAddressPort.substr( 0,
    SrcAddressPort.find_last_of('.') );
    PortString = SrcAddressPort.substr( SrcAddressPort.find_last_of('.') +
    1, std::string::npos );

    std::stringstream Convert;
    Convert << PortString;
    Convert >> SrcPort;

    std::cout << SrcAddress << " : " << SrcPort << "\n";

    }
    Jim Langston, Apr 29, 2008
    #4
  5. xyz

    kwikius Guest

    "xyz" <> wrote in message
    news:...
    >I have a string
    > 16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168
    >
    > for example lets say for the above string
    > 16:23:18.659343 -- time
    > 131.188.37.230 -- srcaddress
    > 22 --srcport
    > 131.188.37.59 --destaddress
    > 1398 --destport
    > tcp --protocol
    > 168 --size
    > i need to split the string such that i need to get all these
    > parameters....
    > the field widths are not fixed..i have some times four/three digits
    > srcport ..so i cant do it with substr function...i need this in c++
    > i am not getting an idea how to split it..
    > thank you for any help


    Parsing is best solved formally with a parser generator, for which the best
    option is to write a grammar.

    Below is a LL(1) grammar written as source code for slk parser:
    LL(1) grammar is very similar to hand written parsing

    http://home.earthlink.net/~slkpg/

    In the grammar the parts prefixed with "__" are actions which you write
    code for in C++ (or C ,Java or C#).
    Slk does most of the rest of the working in creating the application

    ----------------

    /*
    slk grammar
    integer and tcp are terminals from the lexer
    */

    parser :
    time src dest proto

    time:
    integer __hr : integer __min : integer __sec_int [ . integer __sec_frac ]

    src:
    integer __s1 . integer __s2 . integer __s3 . integer __s4 . integer __port

    dest:
    integer __d1 . integer __d2 . integer __d3 . integer __d4

    proto:
    tcp integer __size


    -----------------

    regards
    Andy Little
    kwikius, Apr 29, 2008
    #5
  6. xyz

    xyz Guest

    On Apr 29, 3:32 pm, "kwikius" <> wrote:
    > "xyz" <> wrote in message
    >
    > news:...
    >
    >
    >
    > >I have a string
    > > 16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168

    >
    > > for example lets say for the above string
    > > 16:23:18.659343 -- time
    > > 131.188.37.230   -- srcaddress
    > > 22                        --srcport
    > > 131.188.37.59    --destaddress
    > > 1398                  --destport
    > > tcp                    --protocol
    > > 168                  --size
    > > i need to split the string such that i need to get all these
    > > parameters....
    > > the field widths are not fixed..i have some times four/three digits
    > > srcport ..so i cant do it with substr function...i need this in c++
    > > i am not getting an idea how to split it..
    > > thank you for any help

    >
    > Parsing is best solved formally with a parser generator, for which the best
    > option is to write a grammar.
    >
    > Below is a LL(1) grammar written as source code for slk parser:
    > LL(1)  grammar is very similar to hand written parsing
    >
    > http://home.earthlink.net/~slkpg/
    >
    > In the grammar the  parts prefixed with "__" are actions which you write
    > code for in C++ (or C ,Java or C#).
    > Slk does most of the rest of the working in creating the application
    >
    > ----------------
    >
    > /*
    > slk grammar
    > integer and tcp are terminals  from the lexer
    > */
    >
    > parser :
    >   time src dest proto
    >
    > time:
    >   integer __hr : integer __min : integer  __sec_int [ . integer __sec_frac ]
    >
    > src:
    >   integer __s1 . integer __s2 . integer __s3 . integer __s4 . integer __port
    >
    > dest:
    >   integer __d1 . integer __d2 . integer __d3 . integer __d4
    >
    > proto:
    >   tcp integer __size
    >
    > -----------------
    >
    > regards
    > Andy Little


    i solved it....thanks to all
    xyz, Apr 29, 2008
    #6
  7. xyz

    Default User Guest

    xyz wrote:

    > I have a string


    Pick a language. You posted the same thing (twice) to comp.lang.c.





    Brian
    Default User, Apr 29, 2008
    #7
  8. xyz

    Jim Langston Guest

    Jim Langston wrote:
    >> I have a string
    >> 16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168
    >>
    >> for example lets say for the above string
    >> 16:23:18.659343 -- time
    >> 131.188.37.230 -- srcaddress
    >> 22 --srcport
    >> 131.188.37.59 --destaddress
    >> 1398 --destport
    >> tcp --protocol
    >> 168 --size
    >> i need to split the string such that i need to get all these
    >> parameters....
    >> the field widths are not fixed..i have some times four/three digits
    >> srcport ..so i cant do it with substr function...i need this in c++
    >> i am not getting an idea how to split it..
    >> thank you for any help

    >
    > Not complete but giving you all the pieces.
    >
    > You should use your favorite method for converting from strings to
    > ints, I'im showing a manual stringstream way, but I use a template
    > myself.
    > Output is:
    >
    > 16:23:18.659343 -- time
    > 131.188.37.230.22 -- srcaddress/port
    > 131.188.37.59.1398 -- destaddress/port
    > tcp -- protocol
    > 168 -- size
    >
    > 131.188.37.230 : 22
    >
    > #include <string>
    > #include <sstream>
    > #include <iostream>
    >
    > int main()
    > {
    > std::string Input( "16:23:18.659343 131.188.37.230.22
    > 131.188.37.59.1398 tcp 168" );
    > std::stringstream Stream( Input );
    >
    > std::string Time;
    > std::string SrcAddressPort;
    > std::string DestAddressPort;
    > std::string Protocol;
    > int Size;
    >
    > if ( Stream >> Time >> SrcAddressPort >> DestAddressPort >>
    > Protocol >> Size )
    > {
    > std::cout << Time << " -- time\n" <<
    > SrcAddressPort << " -- srcaddress/port\n" <<
    > DestAddressPort << " -- destaddress/port\n" <<
    > Protocol << " -- protocol\n" <<
    > Size << " -- size\n\n";
    > }
    > else
    > std::cerr << "Parsing error\n";
    >
    > std::string SrcAddress;
    > std::string PortString;
    > int SrcPort = 0;
    >
    > SrcAddress = SrcAddressPort.substr( 0,
    > SrcAddressPort.find_last_of('.') );
    > PortString = SrcAddressPort.substr(
    > SrcAddressPort.find_last_of('.') + 1, std::string::npos );


    Oh, I forgot about a substr overload. This line can be simplified to:
    PortString = SrcAddressPort.substr( SrcAddressPort.find_last_of('.') +
    1 );

    std::string::npos is default for 2nd paramenter.

    > std::stringstream Convert;
    > Convert << PortString;
    > Convert >> SrcPort;
    >
    > std::cout << SrcAddress << " : " << SrcPort << "\n";
    >
    > }


    --
    Jim Langston
    Jim Langston, Apr 30, 2008
    #8
  9. xyz

    James Kanze Guest

    On Apr 29, 3:32 pm, "kwikius" <> wrote:
    > "xyz" <> wrote in message
    > >I have a string
    > > 16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168


    > > for example lets say for the above string
    > > 16:23:18.659343 -- time
    > > 131.188.37.230 -- srcaddress
    > > 22 --srcport
    > > 131.188.37.59 --destaddress
    > > 1398 --destport
    > > tcp --protocol
    > > 168 --size
    > > i need to split the string such that i need to get all these
    > > parameters....
    > > the field widths are not fixed..i have some times four/three digits
    > > srcport ..so i cant do it with substr function...i need this in c++
    > > i am not getting an idea how to split it..


    > Parsing is best solved formally with a parser generator, for
    > which the best option is to write a grammar.


    I don't think that there's a general consensus about that. None
    of the C++ compilers I know use a parser generator for their
    grammar, for example, but prefer hand written ones.

    In the case at hand, of course, you don't even need a full
    parser; his problem can be solved simply by means of extended
    regular expressions, such as those supported by boost::regex.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Apr 30, 2008
    #9
  10. xyz

    Jim Langston Guest

    James Kanze wrote:
    > On Apr 29, 3:32 pm, "kwikius" <> wrote:
    >> "xyz" <> wrote in message
    >>> I have a string
    >>> 16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168

    >
    >>> for example lets say for the above string
    >>> 16:23:18.659343 -- time
    >>> 131.188.37.230 -- srcaddress
    >>> 22 --srcport
    >>> 131.188.37.59 --destaddress
    >>> 1398 --destport
    >>> tcp --protocol
    >>> 168 --size
    >>> i need to split the string such that i need to get all these
    >>> parameters....
    >>> the field widths are not fixed..i have some times four/three digits
    >>> srcport ..so i cant do it with substr function...i need this in c++
    >>> i am not getting an idea how to split it..

    >
    >> Parsing is best solved formally with a parser generator, for
    >> which the best option is to write a grammar.

    >
    > I don't think that there's a general consensus about that. None
    > of the C++ compilers I know use a parser generator for their
    > grammar, for example, but prefer hand written ones.
    >
    > In the case at hand, of course, you don't even need a full
    > parser; his problem can be solved simply by means of extended
    > regular expressions, such as those supported by boost::regex.


    Reading up on C++0x it is supposed to contain regular expressions. Which is
    good for this, but bad because I hate regex.

    But, truthfully, having regex in the language will make parsing this type of
    thing a *lot* easier. Although to me regex expressions usually look like
    just so much line noise.

    --
    Jim Langston
    Jim Langston, Apr 30, 2008
    #10
  11. xyz

    kwikius Guest

    On Apr 30, 10:45 am, James Kanze <> wrote:
    > On Apr 29, 3:32 pm, "kwikius" <> wrote:


    <...>

    > > Parsing is best solved formally with a parser generator, for
    > > which the best option is to write a grammar.

    >
    > I don't think that there's a general consensus about that.  None
    > of the C++ compilers I know use a parser generator for their
    > grammar, for example, but prefer hand written ones.


    I used to agree but someone some time ago "politely suggested" using a
    formal parser rather than writing parsers by hand and now I am
    completely converted. Parser generators will verify the grammar that
    is presented to them and point out ambiguities that a hand written
    parser would never spot. ( have written various parsers by hand ) and
    are easier for others to understand

    Also Bjarne Stroustrup himself says that C++ grammar is "absurd ".
    See:

    http://www.research.att.com/~bs/hopl-almost-final.pdf

    page 38 column 2, half way down, para starting "However , tools and
    environments..

    > In the case at hand, of course, you don't even need a full
    > parser; his problem can be solved simply by means of extended
    > regular expressions, such as those supported by boost::regex.


    I'm sure no expert on regular expressions, but AFAIK you cant abstract
    a part of a regular expression into a production ( e.g "integer" in my
    above example ), so you end up with a long difficult to read and
    verify expression ( which is hard work). If you could have
    productions... I think you'd have a parser grammar. But as I say I am
    no expert and I'm sure someone will correct me if I'm wrong about
    that.

    regards
    Andy Little
    kwikius, Apr 30, 2008
    #11
  12. xyz

    James Kanze Guest

    On Apr 30, 9:54 pm, kwikius <> wrote:
    > On Apr 30, 10:45 am, James Kanze <> wrote:


    > > On Apr 29, 3:32 pm, "kwikius"
    > > <> wrote:


    > <...>


    > > > Parsing is best solved formally with a parser generator,
    > > > for which the best option is to write a grammar.


    > > I don't think that there's a general consensus about that.
    > > None of the C++ compilers I know use a parser generator for
    > > their grammar, for example, but prefer hand written ones.


    > I used to agree but someone some time ago "politely suggested"
    > using a formal parser rather than writing parsers by hand and
    > now I am completely converted. Parser generators will verify
    > the grammar that is presented to them and point out
    > ambiguities that a hand written parser would never spot. (I
    > have written various parsers by hand ) and are easier for
    > others to understand


    I think it depends a lot on the grammar. I regularly use flex
    for smaller things. In general, if the grammar isn't too
    complex, a parser generator may be simpler (and if you define a
    grammar yourself, you should definitely strive to make it not
    too complex). In practice, however, most real programming
    languages have very complex grammars (C++ is probably one of the
    worst), and hand written parsers can usually give better error
    messages, handle error recovery more gracefully, and it's also
    easier to "cheat" a bit when necessary to make things work. (I
    suspect, for example, that most C++ compilers use some sort of
    backtracking in cases where it isn't clear from the initial
    sequence whether you're dealing with a declaration or an
    expression.)

    As for "easier for others to understand", it obviously depends
    on which "others". I've been hassled for using flex because
    some of the "others" aren't familiar with the tool, and don't
    feel at home with anything more complex than recursive descent.

    > Also Bjarne Stroustrup himself says that C++ grammar is
    > "absurd". See:


    > http://www.research.att.com/~bs/hopl-almost-final.pdf


    > page 38 column 2, half way down, para starting "However ,
    > tools and environments..


    Yes. C++ is one of the most difficult languages to parse.

    > > In the case at hand, of course, you don't even need a full
    > > parser; his problem can be solved simply by means of
    > > extended regular expressions, such as those supported by
    > > boost::regex.


    > I'm sure no expert on regular expressions, but AFAIK you cant
    > abstract a part of a regular expression into a production (e.g
    > "integer" in my above example ), so you end up with a long
    > difficult to read and verify expression ( which is hard work).
    > If you could have productions... I think you'd have a parser
    > grammar. But as I say I am no expert and I'm sure someone will
    > correct me if I'm wrong about that.


    The grammar that he's parsing is regular, so you don't need
    anything more complicated than a regular expression. And the
    regular expression matchers I know (e.g. my own or Boost) all
    start with a string. So you would start with something like:

    std::string const integer( "\\d+" ) ;

    and build up the final expression as a string. For the original
    problem, you might end up with something like:

    std::string const integer( "\\d+" ) ;
    std::string const spaces( "\\s+" ) ;
    std::string const time(
    integer + ":" integer + ":" + integer + "\\." +
    integer ) ;
    std::string const ipAddress(
    integer + "\\." + integer
    + "\\." + integer
    + "\\." + integer ) ;
    std::string const fullAddress(
    ipAddress + "\\." + integer ) ;
    // Or should this use a "/" as a
    // separator?
    std::string const protocol( "\l+" ) ;
    // or "\S+" ?
    std::string const line( time
    + spaces + fullAddress
    + spaces + fullAddress
    + spaces + protocol
    + spaces + integer ) ;
    boost::regex pattern( line ) ;

    As usual: divide and conquer. (Note that if you're not afraid
    of a few local macros, the fact that C++ concatenates adjacent
    string literals means that you can actually do all of this at
    compile time, replacing the std::string const with #define, and
    dropping the +'s.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, May 2, 2008
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SnVzdGlu?=

    splitting a string into a drop down

    =?Utf-8?B?SnVzdGlu?=, Oct 25, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    286
  2. John Ericson
    Replies:
    0
    Views:
    421
    John Ericson
    Jul 19, 2003
  3. Mark
    Replies:
    0
    Views:
    436
  4. John Dibling
    Replies:
    0
    Views:
    407
    John Dibling
    Jul 19, 2003
  5. Replies:
    3
    Views:
    643
Loading...

Share This Page