How to avoid \x{...} when converting unicode to latin1?

Discussion in 'Perl Misc' started by Jochen Lehmeier, Jul 21, 2009.

  1. Hello,

    here is a test script that outputs a unicode string which cannot be
    represented in latin1 to a latin1-encoded file:

    my $unicode="hello \x{010d} world";
    binmode STDOUT,":encoding(latin1)";
    print $unicode,"\n";

    The output of Perl 5.8.8 is:

    "\x{010d}" does not map to iso-8859-1 at test.pl line 3.
    hello \x{010d} world

    So far so good. The output is perfectly fine, as expected.

    Is there a way to achieve the output "hello ? world" instead? Having
    non-representable characters replaced by the \x{} notation does not help
    much, in my case. Non-technical users will not understand this, but they
    will understand the "?" (or even latin1-\xbf, the inverse question mark).

    Important: I would like to achieve this on the I/O level, either while
    opening the file handle, or even with a global setting that catches all
    handles opened later. It would be trivial to remove the \x{} using regular
    expressions, but that would mean I'd have to make lots of changes to my
    scripts.

    Thanks in advance!
     
    Jochen Lehmeier, Jul 21, 2009
    #1
    1. Advertising

  2. Jochen Lehmeier

    Bo Lindbergh Guest

    Read the documentation for PerlIO::encoding and note the part about
    $PerlIO::encoding::fallback. You clear the PERLQQ bit like this:

    use PerlIO::encoding;

    $PerlIO::encoding::fallback &= ~ Encode::pERLQQ();
    binmode(STDOUT, ":encoding(iso-latin-1)");
    print "Hello, World!\x{2122}\n";

    If you want to silence the warning, clear the WARN_ON_ERR bit as well:

    $PerlIO::encoding::fallback &=
    ~ (Encode::pERLQQ() | Encode::WARN_ON_ERR());


    /Bo Lindbergh
     
    Bo Lindbergh, Jul 21, 2009
    #2
    1. Advertising

  3. On Tue, 21 Jul 2009 15:57:55 +0200, Bo Lindbergh <> wrote:

    > Read the documentation for PerlIO::encoding and note the part about
    > $PerlIO::encoding::fallback.


    I did notice that part, but the documentation is quite terse around that
    topic; as far as I can tell it is not possible to arrive at your solution
    from there. The PERLQQ and other constants seem to be documented in Encode
    (albeit as "Encode::FB_PERLQQ", not "Encode::pERLQQ()", and in the
    Malformed Data chapter - my problem is not related to malformed data at
    all) while $PerlIO::encoding::fallback is mentioned in the very short
    PerlIO::encoding. Neither place mentions the other, as far as I can tell.

    If whoever maintains that documentation reads this - maybe you could add
    Bo's example in the appropriate place. Not complaining, just pointing out
    a source for trouble.

    > You clear the PERLQQ bit like this:
    > $PerlIO::encoding::fallback &= ~ Encode::pERLQQ();


    Awesome - problem solved.

    > If you want to silence the warning, clear the WARN_ON_ERR bit as well:
    >
    > $PerlIO::encoding::fallback &=
    > ~ (Encode::pERLQQ() | Encode::WARN_ON_ERR());


    Aye, useful.

    Do you happen to know whether it is possible to change that replacement
    character from "?" to another one, as well?

    Thank you!
     
    Jochen Lehmeier, Jul 21, 2009
    #3
  4. Jochen Lehmeier

    Bo Lindbergh Guest

    In article <op.uxe9z2y3mk9oye@frodo>,
    "Jochen Lehmeier" <> wrote:

    > Do you happen to know whether it is possible to change that replacement
    > character from "?" to another one, as well?


    Not easily. Setting $PerlIO::encoding::fallback to a coderef ought to work,
    but it makes PerlIO::encoding do very strange things. You might have to
    define your own encoding (see Encode::Encoding).


    /Bo Lindbergh
     
    Bo Lindbergh, Jul 22, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Fritz Bayer
    Replies:
    2
    Views:
    5,852
    Fritz Bayer
    Apr 20, 2005
  2. Marko Faldix
    Replies:
    8
    Views:
    433
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Dec 15, 2003
  3. Luis P. Mendes

    ascii to latin1

    Luis P. Mendes, May 9, 2006, in forum: Python
    Replies:
    14
    Views:
    743
    Luis P. Mendes
    May 10, 2006
  4. Helmut Jarausch

    restructuredtext latin1 encoding (FAQ?)

    Helmut Jarausch, Jul 3, 2007, in forum: Python
    Replies:
    2
    Views:
    403
    Helmut Jarausch
    Jul 3, 2007
  5. Harshad Modi

    encoding latin1 to utf-8

    Harshad Modi, Sep 10, 2007, in forum: Python
    Replies:
    6
    Views:
    462
    Harshad Modi
    Sep 12, 2007
Loading...

Share This Page