iomanip to escape xml

P

penguish

Hello,
I wanted an iomanipulator to do something like this:
....
string myxml("<root><text format=\"plain\">hello world</text></root>";
cout<<xmlescape<<myxml;
....
with output:
&lt;root&gt;&lt;text format=&quot;plain&quot&gt;hello world&lt;/
text&gt;&lt;/root&gt;

(i.e. <, >, &, ' and " become their equivalent entities)

I found a kind-of solution which I don't like so much and I wonder if
someone has a better idea how this can be achieved. It seems to me
that the iomanipulators can't look ahead in the ostream and see what's
coming and then replace it; and although I know how to install facets
for numbers (num_put) by 'imbuing' the locale, I can't see a way to
reformat the character output.
On of my kind-of solutions is to wrap the string in a simple class
holding the string as a reference in a reference data member; then I
define an inserter for the new class which transforms the string using
for_each with string iterators.

XmlWrapper w(myxml);
cout<<w;

for some reason I can't simply use a temporary, so this doesnt work:

cout<<XmlWrapper(myxml);

My other (and currently used) solution is to define a new ostream
with a new unbuffered stream class which overrides the 'overflow'
method so that xml-escape characters become entities; so my final code
looks like

MyXmlStreamBuffer xbufstream;
std::eek:stream xstr(&xbufstream);
xstr<<myxml;

Is there a solution closer to my original intention? e.g. is it
possible to use the iword/pword information in combination with some
method I am unaware of which can change the characters on output?
 
J

James Kanze

I wanted an iomanipulator to do something like this:
...
string myxml("<root><text format=\"plain\">hello world</text></root>";
cout<<xmlescape<<myxml;
...
with output:
&lt;root&gt;&lt;text format=&quot;plain&quot&gt;hello world&lt;/
text&gt;&lt;/root&gt;
(i.e. <, >, &, ' and " become their equivalent entities)

Iomanip is the wrong tool for this. Iomanip is for customizing
formatted output for specific types. What you need here is a
filtering ostreambuf, which will systematically translate
certain characters, regardless of where they come from.

[...]
My other (and currently used) solution is to define a new
ostream with a new unbuffered stream class which overrides the
'overflow' method so that xml-escape characters become
entities; so my final code looks like
MyXmlStreamBuffer xbufstream;
std::eek:stream xstr(&xbufstream);
xstr<<myxml;
Is there a solution closer to my original intention?

What's wrong with this solution? It's the more or less standard
idiom in such cases.
e.g. is it possible to use the iword/pword information in
combination with some method I am unaware of which can change
the characters on output?

You can't modify the formatting of e.g. a string to ostream. It
is possible to define a special manipulator which returns
something other than an ostream, then define some different
formatting on that type, but if I understand you correctly, what
you really want is to translate specific characters, regardless
of what formatting has produced them. And that's the role of
the streambuf.
 
J

Jerry Coffin

[ ... ]
(i.e. <, >, &, ' and " become their equivalent entities)

I found a kind-of solution which I don't like so much and I wonder if
someone has a better idea how this can be achieved. It seems to me
that the iomanipulators can't look ahead in the ostream and see what's
coming and then replace it; and although I know how to install facets
for numbers (num_put) by 'imbuing' the locale, I can't see a way to
reformat the character output.

You'd reformat the character output using a codecvt facet. It's not
_exactly_ what I think they had in mind when the invented codecvt
facets, but it's pretty close -- close enough that it seems like a
sensible way to implement this functionality. Unfortunately, while
most documentation for locales and facets is pretty poor, for codecvt
facets may be the worst of all, so the code won't be nearly as
trivial as it should be given the simplicity of the task.
 
S

shaun

What's wrong with this solution?  It's the more or less standard
idiom in such cases.


You can't modify the formatting of e.g. a string to ostream.  It
is possible to define a special manipulator which returns
something other than an ostream, then define some different
formatting on that type, but if I understand you correctly, what
you really want is to translate specific characters, regardless
of what formatting has produced them.  And that's the role of
the streambuf.

Thanks for the reply. I should like to be able to turn this on and off
and mix it with cout as well; my application is outputting xml tags
(which have to retain the unchanged '<' and '>') followed by text
which has to be escaped. At the moment I use cout to output the tags
and then my xstr instance to output the xml which must be escaped.
Because my xstr is unbuffered, I have to be sure to flush cout before
sending things to xstr. Possibly there is a way to not only buffer
xstr but also to synchronize the buffer with the cout buffer? Or some
way to use my custom buffer but turn on/off the conversion with a
manipulator (so I dont have to use cout at all)? I'll think some more;
right now I consider the solution adequate (for my current use) but
unsatisfying (I'd like to re-use it in other contexts).
cheers
shaun
 
S

shaun

[ ... ]
(i.e. <, >, &, ' and " become their equivalent entities)
I found a kind-of solution which I don't like so much and I wonder if
someone has a better idea how this can be achieved. It seems to me
that the iomanipulators can't look ahead in the ostream and see what's
coming and then replace it; and although I know how to install facets
for numbers (num_put) by 'imbuing' the locale, I can't see a way to
reformat the character output.

You'd reformat the character output using a codecvt facet. It's not
_exactly_ what I think they had in mind when the invented codecvt
facets, but it's pretty close -- close enough that it seems like a
sensible way to implement this functionality. Unfortunately, while
most documentation for locales and facets is pretty poor, for codecvt
facets may be the worst of all, so the code won't be nearly as
trivial as it should be given the simplicity of the task.

I'll take another look at this; I'm using Angelika Langer's book as a
reference. It wasn't obvious to me, amidst the talk of converting
wchars to char and back, that it would also cope with multicharacter
sequences of the same type when doing the conversion. I'll pay closer
attention and try some code.
thanks
shaun (penguish)
 
S

shaun

most documentation for locales and facets is pretty poor, for codecvt
facets may be the worst of all, so the code won't be nearly as
trivial as it should be given the simplicity of the task.

You were right. I managed, going 'by the book', but so nearly failed
simply because I was unaware of
std::ios::sync_with_stdio( false );
which needed to be once in the beginning of my 'main' (seems like one
can't switch it off and on part way through the program, which is
probably sensible)
Worth writing a blog article on, if I find time.
 
J

James Kanze

Thanks for the reply. I should like to be able to turn this on and off
and mix it with cout as well; my application is outputting xml tags
(which have to retain the unchanged '<' and '>') followed by text
which has to be escaped.

You can insert and remove filtering streambuf's on the fly,
provided you ensure proper synchronization -- for output, just
not buffering in the filtering streambuf is sufficient.
At the moment I use cout to output the tags and then my xstr
instance to output the xml which must be escaped. Because my
xstr is unbuffered, I have to be sure to flush cout before
sending things to xstr. Possibly there is a way to not only
buffer xstr but also to synchronize the buffer with the cout
buffer?

Have it forward to the cout buffer, and don't buffer in it.
Or some way to use my custom buffer but turn on/off
the conversion with a manipulator (so I dont have to use cout
at all)?

That's also possible, but a little more awkward. Basically, you
can arrange for the manipulator to insert or remove the
streambuf on the fly. But you'll probably need some way of
keeping track of the state (using pword), and you may need to
catch events from the stream in order to ensure clean up.
I'll think some more; right now I consider the solution
adequate (for my current use) but unsatisfying (I'd like to
re-use it in other contexts).

For starters, check out the literature on filtering streambufs
(sort of a bit of self-publicing there:)). And check out
Boost's iostream---it contains a very well done implementation
of filtering streambufs. And if you want a manipulator, check
out ios::xalloc and ios::pword, and the possibility of
registering for events from the stream. (One of the events, for
example, is that the stream is being destructed, so if you
allocated memory somewhere, you can free it.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,067
Latest member
HunterTere

Latest Threads

Top