How to encode text into html format

F

Fred Yu

Hi,

I want to encode input text into html format such as replace "<" with "&lt",
replace "&" with "&amp".
Could you give me some ideas? Thanks.

Fred
 
K

Kai-Uwe Bux

Fred said:
Hi,

I want to encode input text into html format such as replace "<" with
"&lt", replace "&" with "&amp".
Could you give me some ideas? Thanks.


Containers: std::map< char, std::string >
Iterators: std::istream_iterator, std::eek:stream_iterator
Algorithms: std::transform


Best

Kai-Uwe Bux
 
A

AnonMail2005

Hi,

I want to encode input text into html format such as replace "<" with "&lt",
replace "&" with "&amp".
Could you give me some ideas? Thanks.

Fred

google iconv. It will convert from many char encodings to many other
char
encodings. I've used it to "format" text in various XML wrapper
classes.
 
E

Elmar Baumann

Fred Yu said:
Hi,

I want to encode input text into html format such as replace "<" with
"&lt",
replace "&" with "&amp".

Example for AnsiString Class

AnsiString Input; //contains the html code
int pos;

do // replace "<" to "&lt"
{
if(Input.Pos("<") > NULL)
{
pos = Input.Pos("<");
Input.Delete(pos,1);
Input.Insert("%26lt",pos);
}
}
while(Input.Pos("<") > NULL);
 
J

James Kanze

Containers: std::map< char, std::string >
Iterators: std::istream_iterator, std::eek:stream_iterator
Algorithms: std::transform

Agreed for the first (although it may be overkill---in this
particular case, I think I'd go with a simple switch).

No real need for the second; just use istream::get() and
ostream::put() (or operator<< in some cases).

As to the third: how? You're replacing a single character with
a sequence of characters, and transform does a one to one (which
in practice makes it of fairly limited utility---although I've
used it with a vector<string>, ostream_iterator, and as string
transformer class that I've written, which works something like
$(patsubst...) in GNU make).
 
K

Kai-Uwe Bux

James said:
Agreed for the first (although it may be overkill---in this
particular case, I think I'd go with a simple switch).

No real need for the second; just use istream::get() and
ostream::put() (or operator<< in some cases).

As to the third: how? You're replacing a single character with
a sequence of characters, and transform does a one to one (which
in practice makes it of fairly limited utility---although I've
used it with a vector<string>, ostream_iterator, and as string
transformer class that I've written, which works something like
$(patsubst...) in GNU make).

I was thinking of something like this:

#include <iostream>
#include <iterator>
#include <map>
#include <algorithm>
#include <cassert>

struct encoder {

std::map< char, std::string > the_map;

encoder ( void ) {
the_map[ 'a' ] = "a";
// ...
the_map[ '&' ] = "&amp";
// ...
}

std::string const & operator() ( char ch ) const {
std::map< char, std::string >::const_iterator iter =
the_map.find( ch );
assert( iter != the_map.end() );
return ( iter->second );
}
};

int main ( void ) {
encoder the_encoder;
std::transform( std::istreambuf_iterator<char>( std::cin ),
std::istreambuf_iterator<char>(),
std::eek:stream_iterator<std::string>( std::cout, "" ),
the_encoder );
}


Best

Kai-Uwe Bux
 
F

Frank Birbacher

Hi!

James said:
As to the third: how? You're replacing a single character with
a sequence of characters, and transform does a one to one (which
in practice makes it of fairly limited utility---although I've
used it with a vector<string>, ostream_iterator, and as string
transformer class that I've written, which works something like
$(patsubst...) in GNU make).

The source range of transform may have another value type than the
destination range.

char const* replace(char);

transform(str.begin(), str.end(),
ostream_iterator<const char*>(cout),
&replace);

Frank
 
J

James Kanze

I was thinking of something like this:
#include <iostream>
#include <iterator>
#include <map>
#include <algorithm>
#include <cassert>
struct encoder {
std::map< char, std::string > the_map;
encoder ( void ) {
the_map[ 'a' ] = "a";
// ...
the_map[ '&' ] = "&amp";
// ...
}
std::string const & operator() ( char ch ) const {
std::map< char, std::string >::const_iterator iter =
the_map.find( ch );
assert( iter != the_map.end() );
return ( iter->second );
}
};
int main ( void ) {
encoder the_encoder;
std::transform( std::istreambuf_iterator<char>( std::cin ),
std::istreambuf_iterator<char>(),
std::eek:stream_iterator<std::string>( std::cout, "" ),
the_encoder );
}

Which looks like a lot of overhead (including in terms of
programming) for very little gain. It might be worth it if you
create some sort of generic encoder, in order to reuse the idiom
in many different contexts, but for such a simple problem, it
just seems overkill for a onetime solution. As I said, I'd
probably go with the switch. If I were going to go to the
effort of initializing the map completely, I'd probably go with
a char const*[UCHAR_MAX], rather than std::map. Or a map with
just the elements which don't use an identity transformation.
And I'd probably still write out the loop; somehow, the idea of
transforming each individual character into a string just to
output it bothers me.
 
J

James Kanze

James Kanze schrieb:
The source range of transform may have another value type than the
destination range.

I'm aware of that, however...
char const* replace(char);
transform(str.begin(), str.end(),
ostream_iterator<const char*>(cout),
&replace);

For some reason, I was thinking in terms of std::string, and not
char const*. And converting each std::string seemed a bit heavy
for the task at hand. But a statically generated char const*[];
why not?
 
K

Kai-Uwe Bux

James said:
I was thinking of something like this:
#include <iostream>
#include <iterator>
#include <map>
#include <algorithm>
#include <cassert>
struct encoder {
std::map< char, std::string > the_map;
encoder ( void ) {
the_map[ 'a' ] = "a";
// ...
the_map[ '&' ] = "&amp";
// ...
}
std::string const & operator() ( char ch ) const {
std::map< char, std::string >::const_iterator iter =
the_map.find( ch );
assert( iter != the_map.end() );
return ( iter->second );
}
};
int main ( void ) {
encoder the_encoder;
std::transform( std::istreambuf_iterator<char>( std::cin ),
std::istreambuf_iterator<char>(),
std::eek:stream_iterator<std::string>( std::cout, "" ),
the_encoder );
}

Which looks like a lot of overhead (including in terms of
programming) for very little gain. It might be worth it if you
create some sort of generic encoder, in order to reuse the idiom
in many different contexts, but for such a simple problem, it
just seems overkill for a onetime solution.

It's just what came to mind first. I tend to think of std::map whenever
there is an obvious table lookup. I like that because (a) it tends to have
exactly one line for each table entry, which can be formatted in such a way
that it is easy to read, and (b) the logic of table lookup is completely
decoupled from the rest of the program. Of course, a simple function

char const * encode ( char ch ) {
switch ( ch ) {
...
}
}

could do the same.

As I said, I'd
probably go with the switch. If I were going to go to the
effort of initializing the map completely, I'd probably go with
a char const*[UCHAR_MAX], rather than std::map. Or a map with
just the elements which don't use an identity transformation.

Initializing the map completely is not a big deal at all. Just change the
constructor slightly:

for ( char ch = std::numeric_limits<char>::min();
ch < std::numeric_limits<char>::max();
++ ch ) {
the_map[ ch ] = ch;
}
the_map[ std::numeric_limits<char>::max() ] =
std::numeric_limits<char>::max();
// now for the special characters:
the_map[ '&' ] = "&amp";
...

And I'd probably still write out the loop; somehow, the idea of
transforming each individual character into a string just to
output it bothers me.


a) Note that the operator() of the encoder returns a string const &. So,
this does not really create a string each time just for output. It only
involves a few levels of indirection (something like char*** instead of
char*).

b) You can use

map< char, char const * >

instead of map< char, string >. Transform will just look up the char const *
and write it, which is very much the same as a hand coded loop. The price
to pay is that the trick from above for initializing all the characters
that are just passed through becomes more tricky.

c) Maybe you are thinking of a _real_ alternative:


#include <iostream>
#include <istream>
#include <ostream>

int main ( void ) {
char ch;
while ( std::cin.get( ch ) ) {
switch ( ch ) {
case '&' : { std::cout << "&amp"; break; }
case '<' : { std::cout << "lt"; break; }
// ...
default : { std::cout << ch; break; }
}
}
}


I have to admit that I don't like that. It mixes flow control and table
lookup to the effect that different types are piped to std::cout (char for
default and const char * for the other characters).



Best

Kai-Uwe Bux
 
F

Frank Birbacher

Hi!

James said:
char const* replace(char);
transform(str.begin(), str.end(),
ostream_iterator<const char*>(cout),
&replace);

For some reason, I was thinking in terms of std::string, and not
char const*. And converting each std::string seemed a bit heavy
for the task at hand. But a statically generated char const*[];
why not?

Yes. I think I needed such a conversion once and used a switch. The
obvious problem is to efficiently handle a char that is not transformed
to more than one char (the common case). I think I actually used
for_each instead of transform:

void appendReplacement(ostream& stream, const char c)
{
switch(c)
{
case '<': stream << "&lt;"; break;
default: stream << c; break;
}
}

This makes it possible to append different types (char or char*) to the
stream and yet requires no [CHAR_MAX] array, but lets the compiler
choose the most efficient lookup (through the switch).

Of course can this function be implemented as a functor.

Frank
 
J

James Kanze

James said:
James Kanze wrote:
Fred Yu wrote:
I want to encode input text into html format such as
replace "<" with "&lt", replace "&" with "&amp". Could
you give me some ideas? Thanks.
Containers: std::map< char, std::string >
Iterators: std::istream_iterator, std::eek:stream_iterator
Algorithms: std::transform
Agreed for the first (although it may be overkill---in this
particular case, I think I'd go with a simple switch).
No real need for the second; just use istream::get() and
ostream::put() (or operator<< in some cases).
As to the third: how? You're replacing a single character
with a sequence of characters, and transform does a one to
one (which in practice makes it of fairly limited
utility---although I've used it with a vector<string>,
ostream_iterator, and as string transformer class that I've
written, which works something like $(patsubst...) in GNU
make).
I was thinking of something like this:
#include <iostream>
#include <iterator>
#include <map>
#include <algorithm>
#include <cassert>
struct encoder {
std::map< char, std::string > the_map;
encoder ( void ) {
the_map[ 'a' ] = "a";
// ...
the_map[ '&' ] = "&amp";
// ...
}
std::string const & operator() ( char ch ) const {
std::map< char, std::string >::const_iterator iter =
the_map.find( ch );
assert( iter != the_map.end() );
return ( iter->second );
}
};
int main ( void ) {
encoder the_encoder;
std::transform( std::istreambuf_iterator<char>( std::cin ),
std::istreambuf_iterator<char>(),
std::eek:stream_iterator<std::string>( std::cout, "" ),
the_encoder );
}
Which looks like a lot of overhead (including in terms of
programming) for very little gain. It might be worth it if you
create some sort of generic encoder, in order to reuse the idiom
in many different contexts, but for such a simple problem, it
just seems overkill for a onetime solution.
It's just what came to mind first. I tend to think of std::map
whenever there is an obvious table lookup.

I'll admit that I didn't think of this particular problem in
terms of table lookup, except to find the replacement string.
That's probably why my approach is so different. (Why I didn't
think of it in these terms is another question. I tend to use
table lookup a lot, even in cases where other people wouldn't.)
I like that because (a) it tends to have exactly one line for
each table entry, which can be formatted in such a way that it
is easy to read,

Or even better, can be generated mechanically. If I used this
solution, I'd probably start with something like:

for ( int i = std::numeric_limits< char >::min() ;
i <= std::numeric_limits< char >::max() ;
++ i ) {
the_map[ i ] = std::string( i, 1 ) ;
}

and then reseat the special cases. (There are only three, after
all.) Or given my experience using C style arrays indexed by a
char (which goes back to before I'd even heard of C++), I might
just do that.
and (b) the logic of table lookup is completely decoupled from
the rest of the program. Of course, a simple function
char const * encode ( char ch ) {
switch ( ch ) {
...
}
}
could do the same.
As I said, I'd probably go with the switch. If I were going
to go to the effort of initializing the map completely, I'd
probably go with a char const*[UCHAR_MAX], rather than
std::map. Or a map with just the elements which don't use
an identity transformation.
Initializing the map completely is not a big deal at all. Just
change the constructor slightly:
for ( char ch = std::numeric_limits<char>::min();
ch < std::numeric_limits<char>::max();
++ ch ) {
the_map[ ch ] = ch;
}
the_map[ std::numeric_limits<char>::max() ] =
std::numeric_limits<char>::max();
// now for the special characters:
the_map[ '&' ] = "&amp";
...
And I'd probably still write out the loop; somehow, the idea
of transforming each individual character into a string just
to output it bothers me.
a) Note that the operator() of the encoder returns a string
const &. So, this does not really create a string each time
just for output. It only involves a few levels of indirection
(something like char*** instead of char*).

I wasn't thinking so much in terms of performance, as I don't
know what. Logically, I was approaching the problem from the
idea: copy the characters, with some special handling for a few
specific characters. Which of course suggests the switch. Of
course, that's probably conditionned by the number of times such
has really been the case: implementing things like printf, etc.,
where the special handling is more than just a one to one
replacement.

The more I think about it, the more I think you're right: it is
a simple mapping problem.
b) You can use
map< char, char const * >
instead of map< char, string >. Transform will just look up
the char const * and write it, which is very much the same as
a hand coded loop. The price to pay is that the trick from
above for initializing all the characters that are just passed
through becomes more tricky.

But nothing that a simple AWK script can't handle:).
c) Maybe you are thinking of a _real_ alternative:
#include <iostream>
#include <istream>
#include <ostream>
int main ( void ) {
char ch;
while ( std::cin.get( ch ) ) {
switch ( ch ) {
case '&' : { std::cout << "&amp"; break; }
case '<' : { std::cout << "lt"; break; }
// ...
default : { std::cout << ch; break; }
}
}
}

That's what I was thinking of.
I have to admit that I don't like that. It mixes flow control
and table lookup to the effect that different types are piped
to std::cout (char for default and const char * for the other
characters).

Yes, but that's the way I first saw the problem. Special
handling for a few special characters, and not table driven code
translation. In this case, I'm probably wrong. I guess I've
just written too much code where it was a case of special
handling.
 
F

Fred Yu

Fred Yu said:
Hi,

I want to encode input text into html format such as replace "<" with
"&lt",
replace "&" with "&amp".
Could you give me some ideas? Thanks.

Fred

Thanks for your help.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top