Data::Dumper vs. UTF-8, as usual

Discussion in 'Perl Misc' started by jidanni, Mar 21, 2011.

  1. jidanni

    jidanni Guest

    Gentlemen, I need to use
    use utf8;
    use open qw/:std :encoding(utf8)/;
    in my program, but it has the side effect of causing
    print Dumper "é¾”";
    to print
    $VAR1 = "\x{9f94}";
    instead of
    $VAR1 = "é¾”";
    like it would otherwise. I dare not touch the 'use' stuff, so how can I
    tweak this?:
    use strict;
    use warnings FATAL => 'all';
    use open qw/:std :encoding(utf8)/;
    use utf8;
    use Data::Dumper;
    print Dumper "é¾”";
    jidanni, Mar 21, 2011
    #1
    1. Advertising

  2. On 2011-03-21, jidanni <> wrote:
    > Gentlemen, I need to use
    > use utf8;
    > use open qw/:std :encoding(utf8)/;
    > in my program, but it has the side effect of causing
    > print Dumper "?";
    > to print
    > $VAR1 = "\x{9f94}";
    > instead of
    > $VAR1 = "?";
    > like it would otherwise. I dare not touch the 'use' stuff, so how can I
    > tweak this?:
    > use strict;
    > use warnings FATAL => 'all';
    > use open qw/:std :encoding(utf8)/;
    > use utf8;
    > use Data::Dumper;
    > print Dumper "?";


    What are you using for editing your files? Are you sure you use a
    real question mark? Check with
    od -tx1a -Ax your_script.pl

    I see no problem here with 5.8.8,
    Ilya
    Ilya Zakharevich, Mar 21, 2011
    #2
    1. Advertising

  3. On 2011-03-21 22:44, Ilya Zakharevich <> wrote:
    > On 2011-03-21, jidanni <> wrote:
    >> Gentlemen, I need to use
    >> use utf8;
    >> use open qw/:std :encoding(utf8)/;
    >> in my program, but it has the side effect of causing
    >> print Dumper "?";
    >> to print
    >> $VAR1 = "\x{9f94}";
    >> instead of
    >> $VAR1 = "?";
    >> like it would otherwise. I dare not touch the 'use' stuff, so how can I
    >> tweak this?:
    >> use strict;
    >> use warnings FATAL => 'all';
    >> use open qw/:std :encoding(utf8)/;
    >> use utf8;
    >> use Data::Dumper;
    >> print Dumper "?";

    >
    > What are you using for editing your files? Are you sure you use a
    > real question mark?


    The only question mark in jidanni's posting was at the end of "how can I
    tweak this?". The character jidanni wants to be displayed is a CJK
    character: http://unicode.org/cgi-bin/GetUnihanData.pl?codepoint=9F94

    > I see no problem here with 5.8.8,


    I see a problem with your newsreader ;-).


    Unfortunately I don't know a solution for the OP's problem. This may be
    a case where writing a custom dumping routine (and uploading it to CPAN)
    may be worthwhile.

    hp
    Peter J. Holzer, Mar 21, 2011
    #3
  4. jidanni wrote:
    > Gentlemen, I need to use
    > use utf8;
    > use open qw/:std :encoding(utf8)/;
    > in my program, but it has the side effect of causing
    > print Dumper "é¾”";


    without utf8, Perl is interpreting that character as just 3 bytes,
    printing those three bytes, and it is your terminal that is converting
    those back into the character that you see. If you were to print the
    length, rather than the output of Dumper, you would see the difference
    that "use utf8" makes.

    As far as I can tell, the "use open" part makes no difference, other
    than to silence a warning about wide characters.

    > to print
    > $VAR1 = "\x{9f94}";
    > instead of
    > $VAR1 = "é¾”";
    > like it would otherwise. I dare not touch the 'use' stuff, so how can I
    > tweak this?:


    Without writing your own version of Data::Dumper (or extending/fixing
    the current one), or doing something basically equivalent, I don't see
    how you can. However, my version of Data::Dumper is rather old, maybe
    it has been already tweaked in the mean time. It could use something
    like $Data::Dumper::Useutf8.

    Xho
    Xho Jingleheimerschmidt, Mar 22, 2011
    #4
  5. On 2011-03-21 23:56, Peter J. Holzer <> wrote:
    > On 2011-03-21 22:44, Ilya Zakharevich <> wrote:
    >> On 2011-03-21, jidanni <> wrote:
    >>> Gentlemen, I need to use
    >>> use utf8;
    >>> use open qw/:std :encoding(utf8)/;
    >>> in my program, but it has the side effect of causing
    >>> print Dumper "?";

    [? was a Chinese character in the OP]
    >>> to print
    >>> $VAR1 = "\x{9f94}";
    >>> instead of
    >>> $VAR1 = "?";
    >>> like it would otherwise. I dare not touch the 'use' stuff, so how can I
    >>> tweak this?:

    [...]
    > Unfortunately I don't know a solution for the OP's problem. This may be
    > a case where writing a custom dumping routine (and uploading it to CPAN)
    > may be worthwhile.


    Forgot to add: It also depends very much on what Data::Dumper is used
    for in the OP's script: Is the output supposed to be readable by humans
    or by other programs? Is the output only used for debugging purposes or
    is the part of the "real" output of the program?

    hp
    Peter J. Holzer, Mar 22, 2011
    #5
  6. On 2011-03-21, Peter J. Holzer <> wrote:
    >>> use Data::Dumper;
    >>> print Dumper "?";

    >>
    >> What are you using for editing your files? Are you sure you use a
    >> real question mark?

    >
    > The only question mark in jidanni's posting was at the end of "how can I
    > tweak this?". The character jidanni wants to be displayed is a CJK
    > character: http://unicode.org/cgi-bin/GetUnihanData.pl?codepoint=9F94
    >
    >> I see no problem here with 5.8.8,

    >
    > I see a problem with your newsreader ;-).


    I do not see any problem with it. It is told that the TTY understands
    latin-1, and performs accordingly. The real problem is with wetware -
    I could have guessed that this question mark is not \x3f...

    Thanks,
    Ilya
    Ilya Zakharevich, Mar 23, 2011
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. kamal
    Replies:
    0
    Views:
    983
    kamal
    Aug 12, 2003
  2. Eric
    Replies:
    0
    Views:
    4,402
  3. rc
    Replies:
    3
    Views:
    1,874
    Chris Uppal
    Aug 16, 2004
  4. Replies:
    4
    Views:
    246
    Paul Lalli
    Aug 1, 2007
  5. August Karlstrom

    Data::Dumper and UTF-8

    August Karlstrom, Oct 21, 2007, in forum: Perl Misc
    Replies:
    6
    Views:
    352
    Peter J. Holzer
    Oct 27, 2007
Loading...

Share This Page