How to encode text into html format

Discussion in 'C++' started by Fred Yu, Jun 1, 2008.

  1. Fred Yu

    Fred Yu Guest

    Hi,

    I want to encode input text into html format such as replace "<" with "&lt",
    replace "&" with "&amp".
    Could you give me some ideas? Thanks.

    Fred
     
    Fred Yu, Jun 1, 2008
    #1
    1. Advertising

  2. Fred Yu

    Kai-Uwe Bux Guest

    Fred Yu wrote:

    > Hi,
    >
    > I want to encode input text into html format such as replace "<" with
    > "&lt", replace "&" with "&amp".
    > Could you give me some ideas? Thanks.



    Containers: std::map< char, std::string >
    Iterators: std::istream_iterator, std::eek:stream_iterator
    Algorithms: std::transform


    Best

    Kai-Uwe Bux
     
    Kai-Uwe Bux, Jun 1, 2008
    #2
    1. Advertising

  3. Fred Yu

    Guest

    On Jun 1, 12:37 pm, "Fred Yu" <> wrote:
    > Hi,
    >
    > I want to encode input text into html format such as replace "<" with "&lt",
    > replace "&" with "&amp".
    > Could you give me some ideas? Thanks.
    >
    > Fred


    google iconv. It will convert from many char encodings to many other
    char
    encodings. I've used it to "format" text in various XML wrapper
    classes.
     
    , Jun 1, 2008
    #3
  4. "Fred Yu" <> schrieb im Newsbeitrag
    news:g1uka7$o1g$99.com...
    > Hi,
    >
    > I want to encode input text into html format such as replace "<" with
    > "&lt",
    > replace "&" with "&amp".


    Example for AnsiString Class

    AnsiString Input; //contains the html code
    int pos;

    do // replace "<" to "&lt"
    {
    if(Input.Pos("<") > NULL)
    {
    pos = Input.Pos("<");
    Input.Delete(pos,1);
    Input.Insert("%26lt",pos);
    }
    }
    while(Input.Pos("<") > NULL);
     
    Elmar Baumann, Jun 1, 2008
    #4
  5. Fred Yu

    James Kanze Guest

    On Jun 1, 8:11 pm, Kai-Uwe Bux <> wrote:
    > Fred Yu wrote:
    > > I want to encode input text into html format such as replace "<" with
    > > "&lt", replace "&" with "&amp".
    > > Could you give me some ideas? Thanks.


    > Containers: std::map< char, std::string >
    > Iterators: std::istream_iterator, std::eek:stream_iterator
    > Algorithms: std::transform


    Agreed for the first (although it may be overkill---in this
    particular case, I think I'd go with a simple switch).

    No real need for the second; just use istream::get() and
    ostream::put() (or operator<< in some cases).

    As to the third: how? You're replacing a single character with
    a sequence of characters, and transform does a one to one (which
    in practice makes it of fairly limited utility---although I've
    used it with a vector<string>, ostream_iterator, and as string
    transformer class that I've written, which works something like
    $(patsubst...) in GNU make).

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 1, 2008
    #5
  6. Fred Yu

    Kai-Uwe Bux Guest

    James Kanze wrote:

    > On Jun 1, 8:11 pm, Kai-Uwe Bux <> wrote:
    >> Fred Yu wrote:
    >> > I want to encode input text into html format such as replace "<" with
    >> > "&lt", replace "&" with "&amp".
    >> > Could you give me some ideas? Thanks.

    >
    >> Containers: std::map< char, std::string >
    >> Iterators: std::istream_iterator, std::eek:stream_iterator
    >> Algorithms: std::transform

    >
    > Agreed for the first (although it may be overkill---in this
    > particular case, I think I'd go with a simple switch).
    >
    > No real need for the second; just use istream::get() and
    > ostream::put() (or operator<< in some cases).
    >
    > As to the third: how? You're replacing a single character with
    > a sequence of characters, and transform does a one to one (which
    > in practice makes it of fairly limited utility---although I've
    > used it with a vector<string>, ostream_iterator, and as string
    > transformer class that I've written, which works something like
    > $(patsubst...) in GNU make).


    I was thinking of something like this:

    #include <iostream>
    #include <iterator>
    #include <map>
    #include <algorithm>
    #include <cassert>

    struct encoder {

    std::map< char, std::string > the_map;

    encoder ( void ) {
    the_map[ 'a' ] = "a";
    // ...
    the_map[ '&' ] = "&amp";
    // ...
    }

    std::string const & operator() ( char ch ) const {
    std::map< char, std::string >::const_iterator iter =
    the_map.find( ch );
    assert( iter != the_map.end() );
    return ( iter->second );
    }
    };

    int main ( void ) {
    encoder the_encoder;
    std::transform( std::istreambuf_iterator<char>( std::cin ),
    std::istreambuf_iterator<char>(),
    std::eek:stream_iterator<std::string>( std::cout, "" ),
    the_encoder );
    }


    Best

    Kai-Uwe Bux
     
    Kai-Uwe Bux, Jun 1, 2008
    #6
  7. Hi!

    James Kanze schrieb:
    > As to the third: how? You're replacing a single character with
    > a sequence of characters, and transform does a one to one (which
    > in practice makes it of fairly limited utility---although I've
    > used it with a vector<string>, ostream_iterator, and as string
    > transformer class that I've written, which works something like
    > $(patsubst...) in GNU make).


    The source range of transform may have another value type than the
    destination range.

    char const* replace(char);

    transform(str.begin(), str.end(),
    ostream_iterator<const char*>(cout),
    &replace);

    Frank
     
    Frank Birbacher, Jun 1, 2008
    #7
  8. Fred Yu

    James Kanze Guest

    On Jun 1, 11:01 pm, Kai-Uwe Bux <> wrote:
    > James Kanze wrote:
    > > On Jun 1, 8:11 pm, Kai-Uwe Bux <> wrote:
    > >> Fred Yu wrote:
    > >> > I want to encode input text into html format such as
    > >> > replace "<" with "&lt", replace "&" with "&amp". Could
    > >> > you give me some ideas? Thanks.


    > >> Containers: std::map< char, std::string >
    > >> Iterators: std::istream_iterator, std::eek:stream_iterator
    > >> Algorithms: std::transform


    > > Agreed for the first (although it may be overkill---in this
    > > particular case, I think I'd go with a simple switch).


    > > No real need for the second; just use istream::get() and
    > > ostream::put() (or operator<< in some cases).


    > > As to the third: how? You're replacing a single character
    > > with a sequence of characters, and transform does a one to
    > > one (which in practice makes it of fairly limited
    > > utility---although I've used it with a vector<string>,
    > > ostream_iterator, and as string transformer class that I've
    > > written, which works something like $(patsubst...) in GNU
    > > make).


    > I was thinking of something like this:


    > #include <iostream>
    > #include <iterator>
    > #include <map>
    > #include <algorithm>
    > #include <cassert>


    > struct encoder {


    > std::map< char, std::string > the_map;


    > encoder ( void ) {
    > the_map[ 'a' ] = "a";
    > // ...
    > the_map[ '&' ] = "&amp";
    > // ...
    > }


    > std::string const & operator() ( char ch ) const {
    > std::map< char, std::string >::const_iterator iter =
    > the_map.find( ch );
    > assert( iter != the_map.end() );
    > return ( iter->second );
    > }
    > };


    > int main ( void ) {
    > encoder the_encoder;
    > std::transform( std::istreambuf_iterator<char>( std::cin ),
    > std::istreambuf_iterator<char>(),
    > std::eek:stream_iterator<std::string>( std::cout, "" ),
    > the_encoder );
    > }


    Which looks like a lot of overhead (including in terms of
    programming) for very little gain. It might be worth it if you
    create some sort of generic encoder, in order to reuse the idiom
    in many different contexts, but for such a simple problem, it
    just seems overkill for a onetime solution. As I said, I'd
    probably go with the switch. If I were going to go to the
    effort of initializing the map completely, I'd probably go with
    a char const*[UCHAR_MAX], rather than std::map. Or a map with
    just the elements which don't use an identity transformation.
    And I'd probably still write out the loop; somehow, the idea of
    transforming each individual character into a string just to
    output it bothers me.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 2, 2008
    #8
  9. Fred Yu

    James Kanze Guest

    On Jun 1, 11:25 pm, Frank Birbacher <> wrote:
    > James Kanze schrieb:


    > > As to the third: how? You're replacing a single character with
    > > a sequence of characters, and transform does a one to one (which
    > > in practice makes it of fairly limited utility---although I've
    > > used it with a vector<string>, ostream_iterator, and as string
    > > transformer class that I've written, which works something like
    > > $(patsubst...) in GNU make).


    > The source range of transform may have another value type than the
    > destination range.


    I'm aware of that, however...

    > char const* replace(char);


    > transform(str.begin(), str.end(),
    > ostream_iterator<const char*>(cout),
    > &replace);


    For some reason, I was thinking in terms of std::string, and not
    char const*. And converting each std::string seemed a bit heavy
    for the task at hand. But a statically generated char const*[];
    why not?

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 2, 2008
    #9
  10. Fred Yu

    Kai-Uwe Bux Guest

    James Kanze wrote:

    > On Jun 1, 11:01 pm, Kai-Uwe Bux <> wrote:
    >> James Kanze wrote:
    >> > On Jun 1, 8:11 pm, Kai-Uwe Bux <> wrote:
    >> >> Fred Yu wrote:
    >> >> > I want to encode input text into html format such as
    >> >> > replace "<" with "&lt", replace "&" with "&amp". Could
    >> >> > you give me some ideas? Thanks.

    >
    >> >> Containers: std::map< char, std::string >
    >> >> Iterators: std::istream_iterator, std::eek:stream_iterator
    >> >> Algorithms: std::transform

    >
    >> > Agreed for the first (although it may be overkill---in this
    >> > particular case, I think I'd go with a simple switch).

    >
    >> > No real need for the second; just use istream::get() and
    >> > ostream::put() (or operator<< in some cases).

    >
    >> > As to the third: how? You're replacing a single character
    >> > with a sequence of characters, and transform does a one to
    >> > one (which in practice makes it of fairly limited
    >> > utility---although I've used it with a vector<string>,
    >> > ostream_iterator, and as string transformer class that I've
    >> > written, which works something like $(patsubst...) in GNU
    >> > make).

    >
    >> I was thinking of something like this:

    >
    >> #include <iostream>
    >> #include <iterator>
    >> #include <map>
    >> #include <algorithm>
    >> #include <cassert>

    >
    >> struct encoder {

    >
    >> std::map< char, std::string > the_map;

    >
    >> encoder ( void ) {
    >> the_map[ 'a' ] = "a";
    >> // ...
    >> the_map[ '&' ] = "&amp";
    >> // ...
    >> }

    >
    >> std::string const & operator() ( char ch ) const {
    >> std::map< char, std::string >::const_iterator iter =
    >> the_map.find( ch );
    >> assert( iter != the_map.end() );
    >> return ( iter->second );
    >> }
    >> };

    >
    >> int main ( void ) {
    >> encoder the_encoder;
    >> std::transform( std::istreambuf_iterator<char>( std::cin ),
    >> std::istreambuf_iterator<char>(),
    >> std::eek:stream_iterator<std::string>( std::cout, "" ),
    >> the_encoder );
    >> }

    >
    > Which looks like a lot of overhead (including in terms of
    > programming) for very little gain. It might be worth it if you
    > create some sort of generic encoder, in order to reuse the idiom
    > in many different contexts, but for such a simple problem, it
    > just seems overkill for a onetime solution.


    It's just what came to mind first. I tend to think of std::map whenever
    there is an obvious table lookup. I like that because (a) it tends to have
    exactly one line for each table entry, which can be formatted in such a way
    that it is easy to read, and (b) the logic of table lookup is completely
    decoupled from the rest of the program. Of course, a simple function

    char const * encode ( char ch ) {
    switch ( ch ) {
    ...
    }
    }

    could do the same.


    > As I said, I'd
    > probably go with the switch. If I were going to go to the
    > effort of initializing the map completely, I'd probably go with
    > a char const*[UCHAR_MAX], rather than std::map. Or a map with
    > just the elements which don't use an identity transformation.


    Initializing the map completely is not a big deal at all. Just change the
    constructor slightly:

    for ( char ch = std::numeric_limits<char>::min();
    ch < std::numeric_limits<char>::max();
    ++ ch ) {
    the_map[ ch ] = ch;
    }
    the_map[ std::numeric_limits<char>::max() ] =
    std::numeric_limits<char>::max();
    // now for the special characters:
    the_map[ '&' ] = "&amp";
    ...


    > And I'd probably still write out the loop; somehow, the idea of
    > transforming each individual character into a string just to
    > output it bothers me.



    a) Note that the operator() of the encoder returns a string const &. So,
    this does not really create a string each time just for output. It only
    involves a few levels of indirection (something like char*** instead of
    char*).

    b) You can use

    map< char, char const * >

    instead of map< char, string >. Transform will just look up the char const *
    and write it, which is very much the same as a hand coded loop. The price
    to pay is that the trick from above for initializing all the characters
    that are just passed through becomes more tricky.

    c) Maybe you are thinking of a _real_ alternative:


    #include <iostream>
    #include <istream>
    #include <ostream>

    int main ( void ) {
    char ch;
    while ( std::cin.get( ch ) ) {
    switch ( ch ) {
    case '&' : { std::cout << "&amp"; break; }
    case '<' : { std::cout << "lt"; break; }
    // ...
    default : { std::cout << ch; break; }
    }
    }
    }


    I have to admit that I don't like that. It mixes flow control and table
    lookup to the effect that different types are piped to std::cout (char for
    default and const char * for the other characters).



    Best

    Kai-Uwe Bux
     
    Kai-Uwe Bux, Jun 2, 2008
    #10
  11. Hi!

    James Kanze schrieb:
    >> char const* replace(char);

    >
    >> transform(str.begin(), str.end(),
    >> ostream_iterator<const char*>(cout),
    >> &replace);

    >
    > For some reason, I was thinking in terms of std::string, and not
    > char const*. And converting each std::string seemed a bit heavy
    > for the task at hand. But a statically generated char const*[];
    > why not?


    Yes. I think I needed such a conversion once and used a switch. The
    obvious problem is to efficiently handle a char that is not transformed
    to more than one char (the common case). I think I actually used
    for_each instead of transform:

    void appendReplacement(ostream& stream, const char c)
    {
    switch(c)
    {
    case '<': stream << "&lt;"; break;
    default: stream << c; break;
    }
    }

    This makes it possible to append different types (char or char*) to the
    stream and yet requires no [CHAR_MAX] array, but lets the compiler
    choose the most efficient lookup (through the switch).

    Of course can this function be implemented as a functor.

    Frank
     
    Frank Birbacher, Jun 2, 2008
    #11
  12. Fred Yu

    James Kanze Guest

    On Jun 2, 11:55 am, Kai-Uwe Bux <> wrote:
    > James Kanze wrote:
    > > On Jun 1, 11:01 pm, Kai-Uwe Bux <> wrote:
    > >> James Kanze wrote:
    > >> > On Jun 1, 8:11 pm, Kai-Uwe Bux <> wrote:
    > >> >> Fred Yu wrote:
    > >> >> > I want to encode input text into html format such as
    > >> >> > replace "<" with "&lt", replace "&" with "&amp". Could
    > >> >> > you give me some ideas? Thanks.


    > >> >> Containers: std::map< char, std::string >
    > >> >> Iterators: std::istream_iterator, std::eek:stream_iterator
    > >> >> Algorithms: std::transform


    > >> > Agreed for the first (although it may be overkill---in this
    > >> > particular case, I think I'd go with a simple switch).


    > >> > No real need for the second; just use istream::get() and
    > >> > ostream::put() (or operator<< in some cases).


    > >> > As to the third: how? You're replacing a single character
    > >> > with a sequence of characters, and transform does a one to
    > >> > one (which in practice makes it of fairly limited
    > >> > utility---although I've used it with a vector<string>,
    > >> > ostream_iterator, and as string transformer class that I've
    > >> > written, which works something like $(patsubst...) in GNU
    > >> > make).


    > >> I was thinking of something like this:


    > >> #include <iostream>
    > >> #include <iterator>
    > >> #include <map>
    > >> #include <algorithm>
    > >> #include <cassert>


    > >> struct encoder {


    > >> std::map< char, std::string > the_map;


    > >> encoder ( void ) {
    > >> the_map[ 'a' ] = "a";
    > >> // ...
    > >> the_map[ '&' ] = "&amp";
    > >> // ...
    > >> }


    > >> std::string const & operator() ( char ch ) const {
    > >> std::map< char, std::string >::const_iterator iter =
    > >> the_map.find( ch );
    > >> assert( iter != the_map.end() );
    > >> return ( iter->second );
    > >> }
    > >> };


    > >> int main ( void ) {
    > >> encoder the_encoder;
    > >> std::transform( std::istreambuf_iterator<char>( std::cin ),
    > >> std::istreambuf_iterator<char>(),
    > >> std::eek:stream_iterator<std::string>( std::cout, "" ),
    > >> the_encoder );
    > >> }


    > > Which looks like a lot of overhead (including in terms of
    > > programming) for very little gain. It might be worth it if you
    > > create some sort of generic encoder, in order to reuse the idiom
    > > in many different contexts, but for such a simple problem, it
    > > just seems overkill for a onetime solution.


    > It's just what came to mind first. I tend to think of std::map
    > whenever there is an obvious table lookup.


    I'll admit that I didn't think of this particular problem in
    terms of table lookup, except to find the replacement string.
    That's probably why my approach is so different. (Why I didn't
    think of it in these terms is another question. I tend to use
    table lookup a lot, even in cases where other people wouldn't.)

    > I like that because (a) it tends to have exactly one line for
    > each table entry, which can be formatted in such a way that it
    > is easy to read,


    Or even better, can be generated mechanically. If I used this
    solution, I'd probably start with something like:

    for ( int i = std::numeric_limits< char >::min() ;
    i <= std::numeric_limits< char >::max() ;
    ++ i ) {
    the_map[ i ] = std::string( i, 1 ) ;
    }

    and then reseat the special cases. (There are only three, after
    all.) Or given my experience using C style arrays indexed by a
    char (which goes back to before I'd even heard of C++), I might
    just do that.

    > and (b) the logic of table lookup is completely decoupled from
    > the rest of the program. Of course, a simple function


    > char const * encode ( char ch ) {
    > switch ( ch ) {
    > ...
    > }
    > }


    > could do the same.


    > > As I said, I'd probably go with the switch. If I were going
    > > to go to the effort of initializing the map completely, I'd
    > > probably go with a char const*[UCHAR_MAX], rather than
    > > std::map. Or a map with just the elements which don't use
    > > an identity transformation.


    > Initializing the map completely is not a big deal at all. Just
    > change the constructor slightly:


    > for ( char ch = std::numeric_limits<char>::min();
    > ch < std::numeric_limits<char>::max();
    > ++ ch ) {
    > the_map[ ch ] = ch;
    > }
    > the_map[ std::numeric_limits<char>::max() ] =
    > std::numeric_limits<char>::max();
    > // now for the special characters:
    > the_map[ '&' ] = "&amp";
    > ...


    > > And I'd probably still write out the loop; somehow, the idea
    > > of transforming each individual character into a string just
    > > to output it bothers me.


    > a) Note that the operator() of the encoder returns a string
    > const &. So, this does not really create a string each time
    > just for output. It only involves a few levels of indirection
    > (something like char*** instead of char*).


    I wasn't thinking so much in terms of performance, as I don't
    know what. Logically, I was approaching the problem from the
    idea: copy the characters, with some special handling for a few
    specific characters. Which of course suggests the switch. Of
    course, that's probably conditionned by the number of times such
    has really been the case: implementing things like printf, etc.,
    where the special handling is more than just a one to one
    replacement.

    The more I think about it, the more I think you're right: it is
    a simple mapping problem.

    > b) You can use


    > map< char, char const * >


    > instead of map< char, string >. Transform will just look up
    > the char const * and write it, which is very much the same as
    > a hand coded loop. The price to pay is that the trick from
    > above for initializing all the characters that are just passed
    > through becomes more tricky.


    But nothing that a simple AWK script can't handle:).

    > c) Maybe you are thinking of a _real_ alternative:


    > #include <iostream>
    > #include <istream>
    > #include <ostream>


    > int main ( void ) {
    > char ch;
    > while ( std::cin.get( ch ) ) {
    > switch ( ch ) {
    > case '&' : { std::cout << "&amp"; break; }
    > case '<' : { std::cout << "lt"; break; }
    > // ...
    > default : { std::cout << ch; break; }
    > }
    > }
    > }


    That's what I was thinking of.

    > I have to admit that I don't like that. It mixes flow control
    > and table lookup to the effect that different types are piped
    > to std::cout (char for default and const char * for the other
    > characters).


    Yes, but that's the way I first saw the problem. Special
    handling for a few special characters, and not table driven code
    translation. In this case, I'm probably wrong. I guess I've
    just written too much code where it was a case of special
    handling.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 2, 2008
    #12
  13. Fred Yu

    Fred Yu Guest

    "Fred Yu" <> дÈëÏûÏ¢ÐÂÎÅ:g1uka7$o1g$99.com...
    > Hi,
    >
    > I want to encode input text into html format such as replace "<" with
    > "&lt",
    > replace "&" with "&amp".
    > Could you give me some ideas? Thanks.
    >
    > Fred
    >
    >


    Thanks for your help.
     
    Fred Yu, Jun 3, 2008
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ryan
    Replies:
    9
    Views:
    5,496
  2. Anthony J Bybell
    Replies:
    0
    Views:
    727
    Anthony J Bybell
    Jan 28, 2005
  3. Rigga

    Format text from db into HTML text

    Rigga, Jun 8, 2005, in forum: ASP .Net
    Replies:
    3
    Views:
    674
    =?Utf-8?B?U2FlaWQ=?=
    Jun 11, 2005
  4. Replies:
    2
    Views:
    680
  5. Newbie
    Replies:
    4
    Views:
    242
    Newbie
    Jul 31, 2004
Loading...

Share This Page