Locales, file parsing and isspace, use_facets etc

A

Adrian

Hi All,

Is there anyway to change what isspace thinks is a space character.

I am parsing some log files and it would be nice to just read a field as what ever is between quotes or between []'s ie clf log files

I know I can go char by char or find_last_of etc, but I would like to know if it is possible with locales and facets?

Also are there any suggestions for outputting in multiple languages? Can I use locales again?
I was just thinking of a class that has the language set at runtime then outputs the correct text as per the language?

//---------------------------------------------------------------------------

#include <sstream>
#include <string>
#include <locale>
#include <iostream>
#pragma hdrstop

//---------------------------------------------------------------------------

#pragma argsused
int main(int argc, char* argv[])
{
std::stringstream strm("209.167.50.22 - - [25/Jan/2006:02:27:14 -0800] \"GET /Services/Development HTTP/1.1\" 301 352 \"-\" \"LinkWalker\"");
std::string host;
std::string ident;
std::string authuser;
std::string datetime;
std::string http_request;
std::string response_code;
std::string xfer_size;
std::string referer;
std::string agent;

strm >> host;
strm >> ident;
strm >> authuser;
//set the isspace to a ]
strm >> datetime;
//set the isspace to a "
strm >> http_request;
//set it back to default
strm >> response_code;
strm >> xfer_size;
//set to "
strm >> referer;
strm >> agent;

std::cout << "host: " << host << std::endl;
std::cout << "ident: " << ident << std::endl;
std::cout << "authuser: " << authuser << std::endl;
std::cout << "datetime: " << datetime << std::endl;
std::cout << "http_request: " << http_request << std::endl;
std::cout << "response_code: " << response_code << std::endl;
std::cout << "xfer_size: " << xfer_size << std::endl;
std::cout << "referer: " << referer << std::endl;
std::cout << "agent: " << agent << std::endl;

return 0;
}
 
S

Salt_Peter

Adrian said:
Hi All,

Is there anyway to change what isspace thinks is a space character.

What? Did you mean use an alternate seperator token(s)? If so, see
below.
I am parsing some log files and it would be nice to just read a field as what ever is between quotes or between []'s ie clf log files

I know I can go char by char or find_last_of etc, but I would like to know if it is possible with locales and facets?

Also are there any suggestions for outputting in multiple languages? Can I use locales again?
I was just thinking of a class that has the language set at runtime then outputs the correct text as per the language?

//---------------------------------------------------------------------------

#include <sstream>
#include <string>
#include <locale>
#include <iostream>
#pragma hdrstop

//---------------------------------------------------------------------------

#pragma argsused
int main(int argc, char* argv[])
{
std::stringstream strm("209.167.50.22 - - [25/Jan/2006:02:27:14 -0800] \"GET /Services/Development HTTP/1.1\" 301 352 \"-\" \"LinkWalker\"");
std::string host;
std::string ident;
std::string authuser;
std::string datetime;
std::string http_request;
std::string response_code;
std::string xfer_size;
std::string referer;
std::string agent;

strm >> host;
strm >> ident;
strm >> authuser;
//set the isspace to a ]
strm >> datetime;
//set the isspace to a "
strm >> http_request;
//set it back to default
strm >> response_code;
strm >> xfer_size;
//set to "
strm >> referer;
strm >> agent;

std::cout << "host: " << host << std::endl;
std::cout << "ident: " << ident << std::endl;
std::cout << "authuser: " << authuser << std::endl;
std::cout << "datetime: " << datetime << std::endl;
std::cout << "http_request: " << http_request << std::endl;
std::cout << "response_code: " << response_code << std::endl;
std::cout << "xfer_size: " << xfer_size << std::endl;
std::cout << "referer: " << referer << std::endl;
std::cout << "agent: " << agent << std::endl;

return 0;
}

Use std::getline to break down the istringstream using a delimiter
token:

#include <iostream>
#include <string>
#include <vector>
#include <sstream>
#include <iterator>

int main()
{
std::string stest("this$string$uses$an$alt$sep$token");
std::istringstream iss(stest);

std::vector< std::string > vs;
std::string buffer;
while( std::getline(iss, buffer, '$') )
{
vs.push_back(buffer);
}
std::copy( vs.begin(),
vs.end(),
std::eek:stream_iterator< std::string >(std::cout, "\n") );
}

/*
this
string
uses
an
alt
sep
token
*/

If thats not what you are looking for, then restate your question
clearly.
As far as locales are concerned, lookup imbue.
 
A

Adrian

Salt_Peter said:
What? Did you mean use an alternate seperator token(s)? If so, see
below.

No, I meant can you change the character that isspaces thinks are whitespace. To be honest I thought the question was in plain english.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top