Encoding question

Discussion in 'Perl Misc' started by Michael Krueger, Jun 18, 2004.

  1. Hi,
    I have a text based application and want to draw some kind of frame, on
    the screen. OS is Debian/Linux using Perl 5.6

    I'm using this code:

    -- snip ---
    my $top = chr(201);
    my $bottom = chr(200);
    for (my $i = 0; $i < ($termCols-2); $i++)
    {
    $top .= chr(205);
    $bottom .= chr(205);
    }
    $top .= chr(187);
    $bottom .= chr(188);

    $term->Tgoto('cm', 0, 0, *STDOUT);
    print $top;
    for (my $i = 1; $i < ($termRows-1); $i++)
    {
    $term->Tgoto('cm', 0, $i, *STDOUT);
    print chr(186);
    $term->Tgoto('cm', $termCols-1, $i, *STDOUT);
    print chr(186);
    }
    $term->Tgoto('cm', 0, $termRows-2, *STDOUT);
    print $bottom;
    -- snip --

    Where $termCols and $termRows are the current terminal lines and columns.

    Problem:
    Due to the encoding to latin-1 charset I didn't get the expected
    frame-symbols but some other accentuated(?) chars.

    How can I change the encoding that I can use the extended ASCII set, which
    is referred often as the most common e.g. on www.asciitable.com, which
    contains these frame-symbols?
    I'm aware of 'use encoding "..";' but I just can't find the correct table. :(

    michael
    Michael Krueger, Jun 18, 2004
    #1
    1. Advertising

  2. Michael Krueger

    Ben Morrow Guest

    Quoth Michael Krueger <-berlin.de>:
    > Hi,
    > I have a text based application and want to draw some kind of frame, on
    > the screen. OS is Debian/Linux using Perl 5.6
    >
    > I'm using this code:
    >
    > -- snip ---
    > my $top = chr(201);
    > my $bottom = chr(200);
    > for (my $i = 0; $i < ($termCols-2); $i++)


    for my $i (0 .. ($termCols-2)) {

    is much more Perlish...

    > {
    > $top .= chr(205);
    > $bottom .= chr(205);
    > }
    > $top .= chr(187);
    > $bottom .= chr(188);


    ....but even more so would be

    my $top = chr(201) . (chr(205) x ($termCols - 2)) . chr(187);

    > $term->Tgoto('cm', 0, 0, *STDOUT);


    I'm not sure which class these methods are from, but you might consider
    using Term::ANSIScreen instead...

    > print $top;
    > for (my $i = 1; $i < ($termRows-1); $i++)
    > {
    > $term->Tgoto('cm', 0, $i, *STDOUT);
    > print chr(186);
    > $term->Tgoto('cm', $termCols-1, $i, *STDOUT);
    > print chr(186);
    > }
    > $term->Tgoto('cm', 0, $termRows-2, *STDOUT);
    > print $bottom;
    > -- snip --
    >
    > Where $termCols and $termRows are the current terminal lines and columns.
    >
    > Problem:
    > Due to the encoding to latin-1 charset I didn't get the expected
    > frame-symbols but some other accentuated(?) chars.


    The first thing to say is that if you want to mess with encodings,
    upgrade to perl 5.8. 5.8 supports Unicode properly, and through that all
    other encodings. The encoding pragma you mention only works in 5.8 (and
    doesn't do what I think you think it does: it changes the encoding your
    *program source* is considered to be in: i.e. the encoding of string
    literals in the source).

    There are, potentially, three encodings in use here: the one perl uses
    to convert the numbers in your source into characters, the one perl uses
    to convert the characters back to numbers again to send to the terminal,
    and the one the terminal uses to decide which glyph to draw.

    An easy and straightforward way to get rid of the first is the use
    "\N{...}" instead of chr, and look up the correct characters in the Big
    Ol' Unicode Character List <http://www.unicode.org/charts/>. You control
    the second using the :encoding layer on filehandles: see perldoc -f
    binmode, perldoc PerlIO::encoding.

    The third is I think your problem here: your terminal is expecting
    Latin-1 (entirely usual in the Unix world) and there are no box drawing
    characters in Latin-1. Your best answer is to persuade your terminal to
    want utf8 instead (unicode_start on the console, xterm -u8, most other
    terminal emulators will support it with an option); then you can call
    binmode STDOUT, ':utf8' and use the Unicode box-drawing characters.

    Ben

    --
    $.=1;*g=sub{print@_};sub r($$\$){my($w,$x,$y)=@_;for(keys%$x){/main/&&next;*p=$
    $x{$_};/(\w)::$/&&(r($w.$1,$x.$_,$y),next);$y eq\$p&&&g("$w$_")}};sub t{for(@_)
    {$f&&($_||&g(" "));$f=1;r"","::",$_;$_&&&g(chr(0012))}};t #
    $J::u::s::t, $a::n::eek:::t::h::e::r, $P::e::r::l, $h::a::c::k::e::r, $.
    Ben Morrow, Jun 18, 2004
    #2
    1. Advertising

  3. Michael Krueger

    Ian Wilson Guest

    Michael Krueger wrote:

    > Hi,
    > I have a text based application and want to draw some kind of frame, on
    > the screen. OS is Debian/Linux using Perl 5.6
    >
    > I'm using this code:
    >
    > -- snip ---
    > my $top = chr(201);
    > my $bottom = chr(200);
    > for (my $i = 0; $i < ($termCols-2); $i++)
    > {
    > $top .= chr(205);
    > $bottom .= chr(205);
    > }
    > $top .= chr(187);
    > $bottom .= chr(188);
    >
    > $term->Tgoto('cm', 0, 0, *STDOUT);
    > print $top;
    > for (my $i = 1; $i < ($termRows-1); $i++)
    > {
    > $term->Tgoto('cm', 0, $i, *STDOUT);
    > print chr(186);
    > $term->Tgoto('cm', $termCols-1, $i, *STDOUT);
    > print chr(186);
    > }
    > $term->Tgoto('cm', 0, $termRows-2, *STDOUT);
    > print $bottom;
    > -- snip --
    >
    > Where $termCols and $termRows are the current terminal lines and columns.
    >
    > Problem:
    > Due to the encoding to latin-1 charset I didn't get the expected
    > frame-symbols but some other accentuated(?) chars.
    >
    > How can I change the encoding that I can use the extended ASCII set, which
    > is referred often as the most common e.g. on www.asciitable.com, which
    > contains these frame-symbols?


    The code set is probably "Code Page 437" variously referred to as
    "cp437", "IBM437", "437" etc. There are national variants too which have
    some or all of the same line-draw characters but include a few accented
    characters or national currency symbols in place of some US characters.

    All those line-draw characters are also in Unicode - this and UTF-8 may
    be a better option.

    See http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT

    Vim supports editing of unicode characters in UTF-8 files, e.g. ISTR
    Control-K dr produces a top-left corner (mnemonic down, right) Control-K
    vv produces a vertical-line and so on.

    > I'm aware of 'use encoding "..";' but I just can't find the correct table. :(
    >


    Googling reveals snippets such as
    binmode (STDOUT, ':encoding(cp437)');

    You need to match encodings with your display device, on a Linux console
    you probably need to check the "locale" settings (LANG etc) and some
    other stuff.

    If using a terminal emulator you need to choose an appropriate font. On
    Windows that might be "Terminal" for IBM437 or "Courier New" for Unicode.
    Ian Wilson, Jun 18, 2004
    #3
  4. On Fri, 18 Jun 2004, Ian Wilson wrote:

    > The code set is probably "Code Page 437" variously referred to as
    > "cp437", "IBM437", "437" etc. There are national variants too


    Er, excuse me, but cp437 -is- the national (USA) variant. The Latin
    multilingual codepage is cp850.

    > All those line-draw characters are also in Unicode - this and UTF-8 may
    > be a better option.


    By now I'm sure that's the best advice, unless there are some special
    factors involved.
    Alan J. Flavell, Jun 18, 2004
    #4
  5. Michael Krueger

    Ian Wilson Guest

    Alan J. Flavell wrote:

    > On Fri, 18 Jun 2004, Ian Wilson wrote:
    >
    >
    >>The code set is probably "Code Page 437" variously referred to as
    >>"cp437", "IBM437", "437" etc. There are national variants too

    >
    >
    > Er, excuse me, but cp437 -is- the national (USA) variant.


    Picky, but also wrong :)
    in my post s/there are national/there are other national/

    in your post s/the national variant/a national variant/
    (at least from where I'm standing, YMMV)


    > The Latin multilingual codepage is cp850.


    Alright but the OP referred to http://www.asciitable.com/ which shows
    CP437.

    I haven't checked every codepoint in the bit described as "Extended
    ASCII" but point 184 looks to me like 437 rather than 850. I can't say I
    like that page much anyhow.
    Ian Wilson, Jun 18, 2004
    #5
  6. On Fri, 18 Jun 2004, Ian Wilson wrote:

    > >>The code set is probably "Code Page 437" variously referred to as
    > >>"cp437", "IBM437", "437" etc. There are national variants too

    > >
    > > Er, excuse me, but cp437 -is- the national (USA) variant.

    >
    > Picky, but also wrong :)
    > in my post s/there are national/there are other national/


    Fine, I'll go with that...

    > in your post s/the national variant/a national variant/


    Rather, s/the national (USA) variant/the USA national variant/
    , to address your nitpick in the way that I had intended.

    Way back (e.g this old MS-DOS 5 manual which I have on the shelf),
    cp437 was advertised as the "English" code page; but already by the
    time of the public release of Win95 (as opposed to the beta, where I
    had chosen to change the codepage to 850 for myself, despite the dire
    warnings in the covering notes), MS were setting the DOS codepage as
    cp850 for Latin-based locales. As far as I know (though I could be
    wrong) they were still setting cp437 in the USA, though.

    > > The Latin multilingual codepage is cp850.

    >
    > Alright but the OP referred to http://www.asciitable.com/ which shows
    > CP437.


    Sure, I wasn't arguing about that part of the posting.

    > I haven't checked every codepoint in the bit described as "Extended
    > ASCII"


    ....a term which always sets off the bogosity alarms. There are
    *numerous* 8-bit character codings which contain ASCII as their first
    half.

    > but point 184 looks to me like 437 rather than 850.


    Indeed. The "Extended ASCII" bogon *does* usually refer to cp437 in
    my experience.

    > I can't say I like that page much anyhow.


    Me too neither. For one thing, its claim that "it took a while to get
    a single standard for these extra characters" is complete nonsense.

    all the best
    Alan J. Flavell, Jun 18, 2004
    #6
  7. On Fri, 18 Jun 2004, Ben Morrow wrote:

    >
    > Quoth Michael Krueger <-berlin.de>:
    > > Hi,
    > > I have a text based application and want to draw some kind of frame, on
    > > the screen. OS is Debian/Linux using Perl 5.6
    > >
    > > I'm using this code:
    > >
    > > -- snip ---
    > > my $top = chr(201);
    > > my $bottom = chr(200);
    > > for (my $i = 0; $i < ($termCols-2); $i++)

    >
    > for my $i (0 .. ($termCols-2)) {
    >
    > is much more Perlish...
    >
    > > {
    > > $top .= chr(205);
    > > $bottom .= chr(205);
    > > }
    > > $top .= chr(187);
    > > $bottom .= chr(188);

    >
    > ...but even more so would be
    >
    > my $top = chr(201) . (chr(205) x ($termCols - 2)) . chr(187);
    >
    > > $term->Tgoto('cm', 0, 0, *STDOUT);

    >
    > I'm not sure which class these methods are from, but you might consider
    > using Term::ANSIScreen instead...
    >
    > > print $top;
    > > for (my $i = 1; $i < ($termRows-1); $i++)
    > > {
    > > $term->Tgoto('cm', 0, $i, *STDOUT);
    > > print chr(186);
    > > $term->Tgoto('cm', $termCols-1, $i, *STDOUT);
    > > print chr(186);
    > > }
    > > $term->Tgoto('cm', 0, $termRows-2, *STDOUT);
    > > print $bottom;
    > > -- snip --
    > >
    > > Where $termCols and $termRows are the current terminal lines and columns.
    > >
    > > Problem:
    > > Due to the encoding to latin-1 charset I didn't get the expected
    > > frame-symbols but some other accentuated(?) chars.

    >
    > The first thing to say is that if you want to mess with encodings,
    > upgrade to perl 5.8. 5.8 supports Unicode properly, and through that all
    > other encodings. The encoding pragma you mention only works in 5.8 (and
    > doesn't do what I think you think it does: it changes the encoding your
    > *program source* is considered to be in: i.e. the encoding of string
    > literals in the source).
    >
    > There are, potentially, three encodings in use here: the one perl uses
    > to convert the numbers in your source into characters, the one perl uses
    > to convert the characters back to numbers again to send to the terminal,
    > and the one the terminal uses to decide which glyph to draw.
    >
    > An easy and straightforward way to get rid of the first is the use
    > "\N{...}" instead of chr, and look up the correct characters in the Big
    > Ol' Unicode Character List <http://www.unicode.org/charts/>. You control
    > the second using the :encoding layer on filehandles: see perldoc -f
    > binmode, perldoc PerlIO::encoding.
    >
    > The third is I think your problem here: your terminal is expecting
    > Latin-1 (entirely usual in the Unix world) and there are no box drawing
    > characters in Latin-1. Your best answer is to persuade your terminal to
    > want utf8 instead (unicode_start on the console, xterm -u8, most other
    > terminal emulators will support it with an option); then you can call
    > binmode STDOUT, ':utf8' and use the Unicode box-drawing characters.
    >
    > Ben
    >
    > --
    > $.=1;*g=sub{print@_};sub r($$\$){my($w,$x,$y)=@_;for(keys%$x){/main/&&next;*p=$
    > $x{$_};/(\w)::$/&&(r($w.$1,$x.$_,$y),next);$y eq\$p&&&g("$w$_")}};sub t{for(@_)
    > {$f&&($_||&g(" "));$f=1;r"","::",$_;$_&&&g(chr(0012))}};t #
    > $J::u::s::t, $a::n::eek:::t::h::e::r, $P::e::r::l, $h::a::c::k::e::r, $.
    >


    Hi,
    thx for your fast reply, this really helped me alot.
    I'll try it with Unicode then.

    Just want to draw those darn boxes ; )

    michael
    Michael Krueger, Jun 18, 2004
    #7
  8. Michael Krueger <-berlin.de> wrote:


    > On Fri, 18 Jun 2004, Ben Morrow wrote:
    > thx for your fast reply, this really helped me alot.
    > I'll try it with Unicode then.


    > Just want to draw those darn boxes ; )


    He gave poor advice however. Most of the interesting terminals support
    line-drawing, which any termcap interface (such as the one in Perl) can
    support.

    The current version of ncurses is 5.4 (20040208)
    There's an faq at
    http://invisible-island.net/ncurses/ncurses.faq.html

    --
    Thomas E. Dickey
    http://invisible-island.net
    ftp://invisible-island.net
    Thomas Dickey, Jun 19, 2004
    #8
  9. Michael Krueger

    Ben Morrow Guest

    Quoth Thomas Dickey <>:
    > Michael Krueger <-berlin.de> wrote:
    >
    >
    > > On Fri, 18 Jun 2004, Ben Morrow wrote:
    > > thx for your fast reply, this really helped me alot.
    > > I'll try it with Unicode then.

    >
    > > Just want to draw those darn boxes ; )

    >
    > He gave poor advice however. Most of the interesting terminals support
    > line-drawing, which any termcap interface (such as the one in Perl) can
    > support.


    Ah, I didn't know that... filed for future reference. Thank you.

    FWIW, I always do boxes just with '+', '-' and '|'...

    Ben

    --
    perl -e'print map {/.(.)/s} sort unpack "a2"x26, pack "N"x13,
    qw/1632265075 1651865445 1685354798 1696626283 1752131169 1769237618
    1801808488 1830841936 1886550130 1914728293 1936225377 1969451372
    2047502190/' #
    Ben Morrow, Jun 19, 2004
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hardy Wang

    Encoding.Default and Encoding.UTF8

    Hardy Wang, Jun 8, 2004, in forum: ASP .Net
    Replies:
    5
    Views:
    18,796
    Jon Skeet [C# MVP]
    Jun 9, 2004
  2. =?Utf-8?B?U29kYQ==?=

    Encoding Question

    =?Utf-8?B?U29kYQ==?=, Jan 2, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    1,713
    Joerg Jooss
    Jan 2, 2005
  3. Replies:
    1
    Views:
    23,312
    Real Gagnon
    Oct 8, 2004
  4. Tony Vella

    page encoding question

    Tony Vella, Dec 19, 2005, in forum: HTML
    Replies:
    5
    Views:
    503
    Alan J. Flavell
    Dec 20, 2005
  5. Replies:
    2
    Views:
    352
Loading...

Share This Page