Encoding question

M

Michael Krueger

Hi,
I have a text based application and want to draw some kind of frame, on
the screen. OS is Debian/Linux using Perl 5.6

I'm using this code:

-- snip ---
my $top = chr(201);
my $bottom = chr(200);
for (my $i = 0; $i < ($termCols-2); $i++)
{
$top .= chr(205);
$bottom .= chr(205);
}
$top .= chr(187);
$bottom .= chr(188);

$term->Tgoto('cm', 0, 0, *STDOUT);
print $top;
for (my $i = 1; $i < ($termRows-1); $i++)
{
$term->Tgoto('cm', 0, $i, *STDOUT);
print chr(186);
$term->Tgoto('cm', $termCols-1, $i, *STDOUT);
print chr(186);
}
$term->Tgoto('cm', 0, $termRows-2, *STDOUT);
print $bottom;
-- snip --

Where $termCols and $termRows are the current terminal lines and columns.

Problem:
Due to the encoding to latin-1 charset I didn't get the expected
frame-symbols but some other accentuated(?) chars.

How can I change the encoding that I can use the extended ASCII set, which
is referred often as the most common e.g. on www.asciitable.com, which
contains these frame-symbols?
I'm aware of 'use encoding "..";' but I just can't find the correct table. :(

michael
 
B

Ben Morrow

Quoth Michael Krueger said:
Hi,
I have a text based application and want to draw some kind of frame, on
the screen. OS is Debian/Linux using Perl 5.6

I'm using this code:

-- snip ---
my $top = chr(201);
my $bottom = chr(200);
for (my $i = 0; $i < ($termCols-2); $i++)

for my $i (0 .. ($termCols-2)) {

is much more Perlish...
{
$top .= chr(205);
$bottom .= chr(205);
}
$top .= chr(187);
$bottom .= chr(188);

....but even more so would be

my $top = chr(201) . (chr(205) x ($termCols - 2)) . chr(187);
$term->Tgoto('cm', 0, 0, *STDOUT);

I'm not sure which class these methods are from, but you might consider
using Term::ANSIScreen instead...
print $top;
for (my $i = 1; $i < ($termRows-1); $i++)
{
$term->Tgoto('cm', 0, $i, *STDOUT);
print chr(186);
$term->Tgoto('cm', $termCols-1, $i, *STDOUT);
print chr(186);
}
$term->Tgoto('cm', 0, $termRows-2, *STDOUT);
print $bottom;
-- snip --

Where $termCols and $termRows are the current terminal lines and columns.

Problem:
Due to the encoding to latin-1 charset I didn't get the expected
frame-symbols but some other accentuated(?) chars.

The first thing to say is that if you want to mess with encodings,
upgrade to perl 5.8. 5.8 supports Unicode properly, and through that all
other encodings. The encoding pragma you mention only works in 5.8 (and
doesn't do what I think you think it does: it changes the encoding your
*program source* is considered to be in: i.e. the encoding of string
literals in the source).

There are, potentially, three encodings in use here: the one perl uses
to convert the numbers in your source into characters, the one perl uses
to convert the characters back to numbers again to send to the terminal,
and the one the terminal uses to decide which glyph to draw.

An easy and straightforward way to get rid of the first is the use
"\N{...}" instead of chr, and look up the correct characters in the Big
Ol' Unicode Character List <http://www.unicode.org/charts/>. You control
the second using the :encoding layer on filehandles: see perldoc -f
binmode, perldoc PerlIO::encoding.

The third is I think your problem here: your terminal is expecting
Latin-1 (entirely usual in the Unix world) and there are no box drawing
characters in Latin-1. Your best answer is to persuade your terminal to
want utf8 instead (unicode_start on the console, xterm -u8, most other
terminal emulators will support it with an option); then you can call
binmode STDOUT, ':utf8' and use the Unicode box-drawing characters.

Ben
 
I

Ian Wilson

Michael said:
Hi,
I have a text based application and want to draw some kind of frame, on
the screen. OS is Debian/Linux using Perl 5.6

I'm using this code:

-- snip ---
my $top = chr(201);
my $bottom = chr(200);
for (my $i = 0; $i < ($termCols-2); $i++)
{
$top .= chr(205);
$bottom .= chr(205);
}
$top .= chr(187);
$bottom .= chr(188);

$term->Tgoto('cm', 0, 0, *STDOUT);
print $top;
for (my $i = 1; $i < ($termRows-1); $i++)
{
$term->Tgoto('cm', 0, $i, *STDOUT);
print chr(186);
$term->Tgoto('cm', $termCols-1, $i, *STDOUT);
print chr(186);
}
$term->Tgoto('cm', 0, $termRows-2, *STDOUT);
print $bottom;
-- snip --

Where $termCols and $termRows are the current terminal lines and columns.

Problem:
Due to the encoding to latin-1 charset I didn't get the expected
frame-symbols but some other accentuated(?) chars.

How can I change the encoding that I can use the extended ASCII set, which
is referred often as the most common e.g. on www.asciitable.com, which
contains these frame-symbols?

The code set is probably "Code Page 437" variously referred to as
"cp437", "IBM437", "437" etc. There are national variants too which have
some or all of the same line-draw characters but include a few accented
characters or national currency symbols in place of some US characters.

All those line-draw characters are also in Unicode - this and UTF-8 may
be a better option.

See http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT

Vim supports editing of unicode characters in UTF-8 files, e.g. ISTR
Control-K dr produces a top-left corner (mnemonic down, right) Control-K
vv produces a vertical-line and so on.
I'm aware of 'use encoding "..";' but I just can't find the correct table. :(

Googling reveals snippets such as
binmode (STDOUT, ':encoding(cp437)');

You need to match encodings with your display device, on a Linux console
you probably need to check the "locale" settings (LANG etc) and some
other stuff.

If using a terminal emulator you need to choose an appropriate font. On
Windows that might be "Terminal" for IBM437 or "Courier New" for Unicode.
 
A

Alan J. Flavell

The code set is probably "Code Page 437" variously referred to as
"cp437", "IBM437", "437" etc. There are national variants too

Er, excuse me, but cp437 -is- the national (USA) variant. The Latin
multilingual codepage is cp850.
All those line-draw characters are also in Unicode - this and UTF-8 may
be a better option.

By now I'm sure that's the best advice, unless there are some special
factors involved.
 
I

Ian Wilson

Alan said:
Er, excuse me, but cp437 -is- the national (USA) variant.

Picky, but also wrong :)
in my post s/there are national/there are other national/

in your post s/the national variant/a national variant/
(at least from where I'm standing, YMMV)

The Latin multilingual codepage is cp850.

Alright but the OP referred to http://www.asciitable.com/ which shows
CP437.

I haven't checked every codepoint in the bit described as "Extended
ASCII" but point 184 looks to me like 437 rather than 850. I can't say I
like that page much anyhow.
 
A

Alan J. Flavell

Picky, but also wrong :)
in my post s/there are national/there are other national/

Fine, I'll go with that...
in your post s/the national variant/a national variant/

Rather, s/the national (USA) variant/the USA national variant/
, to address your nitpick in the way that I had intended.

Way back (e.g this old MS-DOS 5 manual which I have on the shelf),
cp437 was advertised as the "English" code page; but already by the
time of the public release of Win95 (as opposed to the beta, where I
had chosen to change the codepage to 850 for myself, despite the dire
warnings in the covering notes), MS were setting the DOS codepage as
cp850 for Latin-based locales. As far as I know (though I could be
wrong) they were still setting cp437 in the USA, though.
Alright but the OP referred to http://www.asciitable.com/ which shows
CP437.

Sure, I wasn't arguing about that part of the posting.
I haven't checked every codepoint in the bit described as "Extended
ASCII"

....a term which always sets off the bogosity alarms. There are
*numerous* 8-bit character codings which contain ASCII as their first
half.
but point 184 looks to me like 437 rather than 850.

Indeed. The "Extended ASCII" bogon *does* usually refer to cp437 in
my experience.
I can't say I like that page much anyhow.

Me too neither. For one thing, its claim that "it took a while to get
a single standard for these extra characters" is complete nonsense.

all the best
 
M

Michael Krueger

for my $i (0 .. ($termCols-2)) {

is much more Perlish...


...but even more so would be

my $top = chr(201) . (chr(205) x ($termCols - 2)) . chr(187);


I'm not sure which class these methods are from, but you might consider
using Term::ANSIScreen instead...


The first thing to say is that if you want to mess with encodings,
upgrade to perl 5.8. 5.8 supports Unicode properly, and through that all
other encodings. The encoding pragma you mention only works in 5.8 (and
doesn't do what I think you think it does: it changes the encoding your
*program source* is considered to be in: i.e. the encoding of string
literals in the source).

There are, potentially, three encodings in use here: the one perl uses
to convert the numbers in your source into characters, the one perl uses
to convert the characters back to numbers again to send to the terminal,
and the one the terminal uses to decide which glyph to draw.

An easy and straightforward way to get rid of the first is the use
"\N{...}" instead of chr, and look up the correct characters in the Big
Ol' Unicode Character List <http://www.unicode.org/charts/>. You control
the second using the :encoding layer on filehandles: see perldoc -f
binmode, perldoc PerlIO::encoding.

The third is I think your problem here: your terminal is expecting
Latin-1 (entirely usual in the Unix world) and there are no box drawing
characters in Latin-1. Your best answer is to persuade your terminal to
want utf8 instead (unicode_start on the console, xterm -u8, most other
terminal emulators will support it with an option); then you can call
binmode STDOUT, ':utf8' and use the Unicode box-drawing characters.

Ben

--
$.=1;*g=sub{print@_};sub r($$\$){my($w,$x,$y)=@_;for(keys%$x){/main/&&next;*p=$
$x{$_};/(\w)::$/&&(r($w.$1,$x.$_,$y),next);$y eq\$p&&&g("$w$_")}};sub t{for(@_)
{$f&&($_||&g(" "));$f=1;r"","::",$_;$_&&&g(chr(0012))}};t # (e-mail address removed)
$J::u::s::t, $a::n::eek:::t::h::e::r, $P::e::r::l, $h::a::c::k::e::r, $.

Hi,
thx for your fast reply, this really helped me alot.
I'll try it with Unicode then.

Just want to draw those darn boxes ; )

michael
 
T

Thomas Dickey

On Fri, 18 Jun 2004, Ben Morrow wrote:
thx for your fast reply, this really helped me alot.
I'll try it with Unicode then.
Just want to draw those darn boxes ; )

He gave poor advice however. Most of the interesting terminals support
line-drawing, which any termcap interface (such as the one in Perl) can
support.

The current version of ncurses is 5.4 (20040208)
There's an faq at
http://invisible-island.net/ncurses/ncurses.faq.html
 
B

Ben Morrow

Quoth Thomas Dickey said:
He gave poor advice however. Most of the interesting terminals support
line-drawing, which any termcap interface (such as the one in Perl) can
support.

Ah, I didn't know that... filed for future reference. Thank you.

FWIW, I always do boxes just with '+', '-' and '|'...

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top