string splitting

X

xyz

I have a string
16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168

for example lets say for the above string
16:23:18.659343 -- time
131.188.37.230 -- srcaddress
22 --srcport
131.188.37.59 --destaddress
1398 --destport
tcp --protocol
168 --size
i need to split the string such that i need to get all these
parameters....
the field widths are not fixed..i have some times four/three digits
srcport ..so i cant do it with substr function...i need this in c++
i am not getting an idea how to split it..
thank you for any help
 
J

Jim Langston

--
Jim Langston
(e-mail address removed)
xyz said:
I have a string
16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168

for example lets say for the above string
16:23:18.659343 -- time
131.188.37.230 -- srcaddress
22 --srcport
131.188.37.59 --destaddress
1398 --destport
tcp --protocol
168 --size
i need to split the string such that i need to get all these
parameters....
the field widths are not fixed..i have some times four/three digits
srcport ..so i cant do it with substr function...i need this in c++
i am not getting an idea how to split it..
thank you for any help

Not complete but giving you all the pieces.

You should use your favorite method for converting from strings to ints,
I'im showing a manual stringstream way, but I use a template myself.

Output is:

16:23:18.659343 -- time
131.188.37.230.22 -- srcaddress/port
131.188.37.59.1398 -- destaddress/port
tcp -- protocol
168 -- size

131.188.37.230 : 22

#include <string>
#include <sstream>
#include <iostream>

int main()
{
std::string Input( "16:23:18.659343 131.188.37.230.22 131.188.37.59.1398
tcp 168" );
std::stringstream Stream( Input );

std::string Time;
std::string SrcAddressPort;
std::string DestAddressPort;
std::string Protocol;
int Size;

if ( Stream >> Time >> SrcAddressPort >> DestAddressPort >> Protocol >>
Size )
{
std::cout << Time << " -- time\n" <<
SrcAddressPort << " -- srcaddress/port\n" <<
DestAddressPort << " -- destaddress/port\n" <<
Protocol << " -- protocol\n" <<
Size << " -- size\n\n";
}
else
std::cerr << "Parsing error\n";

std::string SrcAddress;
std::string PortString;
int SrcPort = 0;

SrcAddress = SrcAddressPort.substr( 0,
SrcAddressPort.find_last_of('.') );
PortString = SrcAddressPort.substr( SrcAddressPort.find_last_of('.') +
1, std::string::npos );

std::stringstream Convert;
Convert << PortString;
Convert >> SrcPort;

std::cout << SrcAddress << " : " << SrcPort << "\n";

}
 
K

kwikius

xyz said:
I have a string
16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168

for example lets say for the above string
16:23:18.659343 -- time
131.188.37.230 -- srcaddress
22 --srcport
131.188.37.59 --destaddress
1398 --destport
tcp --protocol
168 --size
i need to split the string such that i need to get all these
parameters....
the field widths are not fixed..i have some times four/three digits
srcport ..so i cant do it with substr function...i need this in c++
i am not getting an idea how to split it..
thank you for any help

Parsing is best solved formally with a parser generator, for which the best
option is to write a grammar.

Below is a LL(1) grammar written as source code for slk parser:
LL(1) grammar is very similar to hand written parsing

http://home.earthlink.net/~slkpg/

In the grammar the parts prefixed with "__" are actions which you write
code for in C++ (or C ,Java or C#).
Slk does most of the rest of the working in creating the application

----------------

/*
slk grammar
integer and tcp are terminals from the lexer
*/

parser :
time src dest proto

time:
integer __hr : integer __min : integer __sec_int [ . integer __sec_frac ]

src:
integer __s1 . integer __s2 . integer __s3 . integer __s4 . integer __port

dest:
integer __d1 . integer __d2 . integer __d3 . integer __d4

proto:
tcp integer __size
 
X

xyz

I have a string
16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168
for example lets say for the above string
16:23:18.659343 -- time
131.188.37.230   -- srcaddress
22                        --srcport
131.188.37.59    --destaddress
1398                  --destport
tcp                    --protocol
168                  --size
i need to split the string such that i need to get all these
parameters....
the field widths are not fixed..i have some times four/three digits
srcport ..so i cant do it with substr function...i need this in c++
i am not getting an idea how to split it..
thank you for any help

Parsing is best solved formally with a parser generator, for which the best
option is to write a grammar.

Below is a LL(1) grammar written as source code for slk parser:
LL(1)  grammar is very similar to hand written parsing

http://home.earthlink.net/~slkpg/

In the grammar the  parts prefixed with "__" are actions which you write
code for in C++ (or C ,Java or C#).
Slk does most of the rest of the working in creating the application

----------------

/*
slk grammar
integer and tcp are terminals  from the lexer
*/

parser :
  time src dest proto

time:
  integer __hr : integer __min : integer  __sec_int [ . integer __sec_frac ]

src:
  integer __s1 . integer __s2 . integer __s3 . integer __s4 . integer __port

dest:
  integer __d1 . integer __d2 . integer __d3 . integer __d4

proto:
  tcp integer __size

i solved it....thanks to all
 
J

Jim Langston

Jim said:
Not complete but giving you all the pieces.

You should use your favorite method for converting from strings to
ints, I'im showing a manual stringstream way, but I use a template
myself.
Output is:

16:23:18.659343 -- time
131.188.37.230.22 -- srcaddress/port
131.188.37.59.1398 -- destaddress/port
tcp -- protocol
168 -- size

131.188.37.230 : 22

#include <string>
#include <sstream>
#include <iostream>

int main()
{
std::string Input( "16:23:18.659343 131.188.37.230.22
131.188.37.59.1398 tcp 168" );
std::stringstream Stream( Input );

std::string Time;
std::string SrcAddressPort;
std::string DestAddressPort;
std::string Protocol;
int Size;

if ( Stream >> Time >> SrcAddressPort >> DestAddressPort >>
Protocol >> Size )
{
std::cout << Time << " -- time\n" <<
SrcAddressPort << " -- srcaddress/port\n" <<
DestAddressPort << " -- destaddress/port\n" <<
Protocol << " -- protocol\n" <<
Size << " -- size\n\n";
}
else
std::cerr << "Parsing error\n";

std::string SrcAddress;
std::string PortString;
int SrcPort = 0;

SrcAddress = SrcAddressPort.substr( 0,
SrcAddressPort.find_last_of('.') );
PortString = SrcAddressPort.substr(
SrcAddressPort.find_last_of('.') + 1, std::string::npos );

Oh, I forgot about a substr overload. This line can be simplified to:
PortString = SrcAddressPort.substr( SrcAddressPort.find_last_of('.') +
1 );

std::string::npos is default for 2nd paramenter.
 
J

James Kanze

Parsing is best solved formally with a parser generator, for
which the best option is to write a grammar.

I don't think that there's a general consensus about that. None
of the C++ compilers I know use a parser generator for their
grammar, for example, but prefer hand written ones.

In the case at hand, of course, you don't even need a full
parser; his problem can be solved simply by means of extended
regular expressions, such as those supported by boost::regex.
 
J

Jim Langston

James said:
I don't think that there's a general consensus about that. None
of the C++ compilers I know use a parser generator for their
grammar, for example, but prefer hand written ones.

In the case at hand, of course, you don't even need a full
parser; his problem can be solved simply by means of extended
regular expressions, such as those supported by boost::regex.

Reading up on C++0x it is supposed to contain regular expressions. Which is
good for this, but bad because I hate regex.

But, truthfully, having regex in the language will make parsing this type of
thing a *lot* easier. Although to me regex expressions usually look like
just so much line noise.
 
K

kwikius

I don't think that there's a general consensus about that.  None
of the C++ compilers I know use a parser generator for their
grammar, for example, but prefer hand written ones.

I used to agree but someone some time ago "politely suggested" using a
formal parser rather than writing parsers by hand and now I am
completely converted. Parser generators will verify the grammar that
is presented to them and point out ambiguities that a hand written
parser would never spot. ( have written various parsers by hand ) and
are easier for others to understand

Also Bjarne Stroustrup himself says that C++ grammar is "absurd ".
See:

http://www.research.att.com/~bs/hopl-almost-final.pdf

page 38 column 2, half way down, para starting "However , tools and
environments..
In the case at hand, of course, you don't even need a full
parser; his problem can be solved simply by means of extended
regular expressions, such as those supported by boost::regex.

I'm sure no expert on regular expressions, but AFAIK you cant abstract
a part of a regular expression into a production ( e.g "integer" in my
above example ), so you end up with a long difficult to read and
verify expression ( which is hard work). If you could have
productions... I think you'd have a parser grammar. But as I say I am
no expert and I'm sure someone will correct me if I'm wrong about
that.

regards
Andy Little
 
J

James Kanze

On Apr 29, 3:32 pm, "kwikius"
I used to agree but someone some time ago "politely suggested"
using a formal parser rather than writing parsers by hand and
now I am completely converted. Parser generators will verify
the grammar that is presented to them and point out
ambiguities that a hand written parser would never spot. (I
have written various parsers by hand ) and are easier for
others to understand

I think it depends a lot on the grammar. I regularly use flex
for smaller things. In general, if the grammar isn't too
complex, a parser generator may be simpler (and if you define a
grammar yourself, you should definitely strive to make it not
too complex). In practice, however, most real programming
languages have very complex grammars (C++ is probably one of the
worst), and hand written parsers can usually give better error
messages, handle error recovery more gracefully, and it's also
easier to "cheat" a bit when necessary to make things work. (I
suspect, for example, that most C++ compilers use some sort of
backtracking in cases where it isn't clear from the initial
sequence whether you're dealing with a declaration or an
expression.)

As for "easier for others to understand", it obviously depends
on which "others". I've been hassled for using flex because
some of the "others" aren't familiar with the tool, and don't
feel at home with anything more complex than recursive descent.
Also Bjarne Stroustrup himself says that C++ grammar is
"absurd". See:

page 38 column 2, half way down, para starting "However ,
tools and environments..

Yes. C++ is one of the most difficult languages to parse.
I'm sure no expert on regular expressions, but AFAIK you cant
abstract a part of a regular expression into a production (e.g
"integer" in my above example ), so you end up with a long
difficult to read and verify expression ( which is hard work).
If you could have productions... I think you'd have a parser
grammar. But as I say I am no expert and I'm sure someone will
correct me if I'm wrong about that.

The grammar that he's parsing is regular, so you don't need
anything more complicated than a regular expression. And the
regular expression matchers I know (e.g. my own or Boost) all
start with a string. So you would start with something like:

std::string const integer( "\\d+" ) ;

and build up the final expression as a string. For the original
problem, you might end up with something like:

std::string const integer( "\\d+" ) ;
std::string const spaces( "\\s+" ) ;
std::string const time(
integer + ":" integer + ":" + integer + "\\." +
integer ) ;
std::string const ipAddress(
integer + "\\." + integer
+ "\\." + integer
+ "\\." + integer ) ;
std::string const fullAddress(
ipAddress + "\\." + integer ) ;
// Or should this use a "/" as a
// separator?
std::string const protocol( "\l+" ) ;
// or "\S+" ?
std::string const line( time
+ spaces + fullAddress
+ spaces + fullAddress
+ spaces + protocol
+ spaces + integer ) ;
boost::regex pattern( line ) ;

As usual: divide and conquer. (Note that if you're not afraid
of a few local macros, the fact that C++ concatenates adjacent
string literals means that you can actually do all of this at
compile time, replacing the std::string const with #define, and
dropping the +'s.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top