Encoding problem: Rsquo to a with a hat

Discussion in 'Perl Misc' started by afrinspray, Oct 26, 2006.

  1. afrinspray

    afrinspray Guest

    I posted a message titled "Best way to remove body/html tag from
    HTML::Element tree" on Sep 6 2006.

    Tad McClellan helped me out by referring me to
    http://perlmonks.org/?node_id=554219 which explains using
    XML::SAX::Writer. Everything was going well with the tag parsing until
    I started giving the sax parser special characters for quotes:

    Hopefully these characters make it through... it's converting:
    & r s q u o ; (no spaces)
    to:
    รข (a with a hat)

    Thanks in advance....


    Mike
     
    afrinspray, Oct 26, 2006
    #1
    1. Advertising

  2. afrinspray <> wrote:

    > Tad McClellan helped me out by referring me to
    > http://perlmonks.org/?node_id=554219



    No I didn't.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Oct 26, 2006
    #2
    1. Advertising

  3. afrinspray

    afrinspray Guest

    afrinspray, Oct 26, 2006
    #3
  4. afrinspray

    afrinspray Guest

    Sorry that was Todd W.:
    http://groups-beta.google.com/group/comp.lang.perl.misc/browse_thread/thread/a1d24b4eec251e80/

    Anyway, does anyone have any ideas how I can get it to stop convert & n
    b s p ; and other standard HTML entities to gibberish?


    Mike

    On Oct 26, 2:02 pm, Tad McClellan <> wrote:
    > afrinspray <> wrote:
    > > Tad McClellan helped me out by referring me to
    > >http://perlmonks.org/?node_id=554219No I didn't.

    >
    > --
    > Tad McClellan SGML consulting
    > Perl programming
    > Fort Worth, Texas
     
    afrinspray, Oct 27, 2006
    #4
  5. afrinspray

    afrinspray Guest

    Ok after some research I think I can better narrow down the problem I'm
    having. The module XML::Filter::SAX1toSAX2 is converting my html
    entities (&nbsp; &#8217 etc...) to weird characters.

    I changed the XML::SAX::Machines Pipeline in my code from this:
    my $machine = Pipeline(
    'XML::Filter::SAX1toSAX2' =>
    'XML::Filter::BufferText' =>
    'XML::Filter::HtmlTagStripper' =>
    $writer
    );

    to
    my $machine = Pipeline(
    'XML::Filter::SAX1toSAX2' =>
    \*STDOUT
    );

    and it's converting the entities to gibberish. Is there another
    SAX1toSAX2 like module out there? Can anyone thing of a replacement?
    If i remove the SAX1toSAX2 call from the Pipeline, there's no output.

    Also, on a side note I previous decoded the input using
    MIME::Decoder...

    Any help would be greatly appreciated.

    Mike


    afrinspray wrote:
    > Sorry that was Todd W.:
    > http://groups-beta.google.com/group/comp.lang.perl.misc/browse_thread/thread/a1d24b4eec251e80/
    >
    > Anyway, does anyone have any ideas how I can get it to stop convert & n
    > b s p ; and other standard HTML entities to gibberish?
    >
    >
    > Mike
     
    afrinspray, Oct 27, 2006
    #5
  6. afrinspray

    afrinspray Guest

    Todd W wrote:
    > 1: you are not exporting the data from perl as UTF8
    > and/or
    > 2: your document reader is either not configured to or capable of rendering
    > UTF8.
    > ...
    > binmode STDOUT, ":utf8";
    > ...
    > <meta http-equiv="Content-Type" content="text/html; charset=utf-8">



    Thanks so much for your reply! I totally understand what's going on
    now. My problem is a combination of both one and two. I was getting
    the "Wide character in print" warning as well, so I must not be
    exporting to utf-8 correctly. Also, I'm reading the content in both
    firefox and ie, so I'll have to add the charset to the meta tag as you
    did above.

    Thanks again for your help,
    Mike
     
    afrinspray, Nov 2, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jared
    Replies:
    1
    Views:
    683
    Harald Hein
    Jul 8, 2003
  2. Sphenxes
    Replies:
    2
    Views:
    346
    Sphenxes
    Sep 5, 2003
  3. qazmlp
    Replies:
    3
    Views:
    455
    Brad BARCLAY
    Jan 8, 2004
  4. qazmlp
    Replies:
    3
    Views:
    671
    Robert Olofsson
    Jan 7, 2004
  5. qazmlp
    Replies:
    0
    Views:
    361
    qazmlp
    Feb 9, 2004
Loading...

Share This Page