Wide character in print

Y

Yuri Shtil

Hi all

I am getting this when I try to print certain strings. Is it harmless ?

If not, how do I get rid of it ?

Yuri.
 
G

Gregory Toomey

Yuri Shtil said:
Hi all

I am getting this when I try to print certain strings. Is it harmless ?

If not, how do I get rid of it ?

Yuri.

Upgrade! Fixed on Eniac 2.

gtoomey
 
E

Eric Amick

What is Eniac 2 ?

Sorry for ignorance !!!

It's a stupid joke. Ignore it. I suspect you're trying to print Unicode
characters to a filehandle that isn't expecting them. You should be able
to fix the problem by adding

binmode(FILEHANDLE, ":utf8");

after the opening of the filehandle. If that doesn't work, you should be
able to turn off the warning.

perldoc perldiag
 
A

Alan J. Flavell

It's a stupid joke.

Well, I thought it was rather amusing; but then, the hon. Usenaut
could perhaps be advised to pay more attention to Usenet posting
conventions, and to entrust unknown terminology to a search engine of
their choice before revealing ignorance of the history of computers in
public... [An aside on the topic of character coding and old
computers: http://www.mailcom.com/besm6/ shows what can happen when
people try to put two different character codings into the same web
page - Mozilla decided it must be Chinese, with unfortunate
results...] [OK, so BESM-6 was a youngster compared to ENIAC]
I suspect you're trying to print Unicode
characters to a filehandle that isn't expecting them.

OK, let's get serious.

There is a Perl document (perldiag) which lists the error messages
issued by perl itself. For 5.8.0 this document could be perused at
http://www.perldoc.com/perl5.8.0/pod/perldiag.html ,
although it's also part of any complete Perl installation.

This should be the _first_ recourse for any unrecognised message.

And indeed, here is the offending item:

Wide character in %s
(W utf8) Perl met a wide character (>255) when it wasn't expecting
one. This warning is by default on for I/O (like print) but can be
turned off by no warnings 'utf8';. You are supposed to explicitly
mark the filehandle with an encoding, see open and perlfunc/binmode.

Seems to me that they key phrase here is "You are supposed to...".
You should be able to fix the problem by adding

binmode(FILEHANDLE, ":utf8");

Do you think so? That tells Perl that the filehandle *is* expecting
utf-8 encoding, but if it isn't in fact expecting it, then it's
likely to cause an even worse problem.

If the hon. Usenaut is expecting a particular character coding on
their output, I would recommend (in 5.8.0) defining that coding in
an encoding layer, to give Perl the chance to convert between "Wide
characters" internally, and the expected encoding externally.

Without some context, I've no idea whether the material in question
might want to be koi8-r (the traditional encoding for Russian
Cyrillic), or nothing more exciting than Windows-1252; but either way,
an :encoding layer is what I'd recommend.

The relevant documentation page that's called out from the binmode()
page is: http://www.perldoc.com/perl5.8.0/lib/open.html

(In earlier Perl versions, one needs to call the encoding explicitly,
instead of including it in the open/binmode calls).
If that doesn't work, you should be able to turn off the warning.

But again: the warning is there for a reason. Just hiding the warning
doesn't make that reason go away. I would recommend identifying and
then solving the problem, not just hiding it.

You then added, almost it seems as an afterthought:
perldoc perldiag

Oh, right: but I'd suggest putting that up-front, IMNSHO it's the
single most important part of this reply.

cheers
 
Y

Yuri Shtil

I am amazed how a simple question can start something close to a flaming war
!!!
Are only superbly educated in computer history are allowed to participate in
this group ?

On the serious note, my problem showed up when I tried to parse/write XML
code that came from a third party application.
So I have no idea what to expect since the application does not specify the
encoding (or at least I don't know how to extract it).

These wide characters just showed up in some records.

There is an another problem.

My code passes extracted XML strings to an another application as counted
strings. It seems that the Perl length function returns an incorrect result
when these
"wide" characters are present.

Again, please pardon my ignorance and try to avoid flaming each other.

Alan J. Flavell said:
It's a stupid joke.

Well, I thought it was rather amusing; but then, the hon. Usenaut
could perhaps be advised to pay more attention to Usenet posting
conventions, and to entrust unknown terminology to a search engine of
their choice before revealing ignorance of the history of computers in
public... [An aside on the topic of character coding and old
computers: http://www.mailcom.com/besm6/ shows what can happen when
people try to put two different character codings into the same web
page - Mozilla decided it must be Chinese, with unfortunate
results...] [OK, so BESM-6 was a youngster compared to ENIAC]
I suspect you're trying to print Unicode
characters to a filehandle that isn't expecting them.

OK, let's get serious.

There is a Perl document (perldiag) which lists the error messages
issued by perl itself. For 5.8.0 this document could be perused at
http://www.perldoc.com/perl5.8.0/pod/perldiag.html ,
although it's also part of any complete Perl installation.

This should be the _first_ recourse for any unrecognised message.

And indeed, here is the offending item:

Wide character in %s
(W utf8) Perl met a wide character (>255) when it wasn't expecting
one. This warning is by default on for I/O (like print) but can be
turned off by no warnings 'utf8';. You are supposed to explicitly
mark the filehandle with an encoding, see open and perlfunc/binmode.

Seems to me that they key phrase here is "You are supposed to...".
You should be able to fix the problem by adding

binmode(FILEHANDLE, ":utf8");

Do you think so? That tells Perl that the filehandle *is* expecting
utf-8 encoding, but if it isn't in fact expecting it, then it's
likely to cause an even worse problem.

If the hon. Usenaut is expecting a particular character coding on
their output, I would recommend (in 5.8.0) defining that coding in
an encoding layer, to give Perl the chance to convert between "Wide
characters" internally, and the expected encoding externally.

Without some context, I've no idea whether the material in question
might want to be koi8-r (the traditional encoding for Russian
Cyrillic), or nothing more exciting than Windows-1252; but either way,
an :encoding layer is what I'd recommend.

The relevant documentation page that's called out from the binmode()
page is: http://www.perldoc.com/perl5.8.0/lib/open.html

(In earlier Perl versions, one needs to call the encoding explicitly,
instead of including it in the open/binmode calls).
If that doesn't work, you should be able to turn off the warning.

But again: the warning is there for a reason. Just hiding the warning
doesn't make that reason go away. I would recommend identifying and
then solving the problem, not just hiding it.

You then added, almost it seems as an afterthought:
perldoc perldiag

Oh, right: but I'd suggest putting that up-front, IMNSHO it's the
single most important part of this reply.

cheers
 
A

Alan J. Flavell

Are only superbly educated in computer history are allowed to participate in
this group ?

You're no fun in a usenet discussion...
On the serious note, my problem showed up when I tried to parse/write XML
code that came from a third party application.
So I have no idea what to expect since the application does not specify the
encoding

But a text file is, in general, useless without a specification of its
character encoding.
(or at least I don't know how to extract it).

It's not normally something that one can "extract" in any formal way
from the datastream itself; it's a piece of meta-data that goes along
with the data. However, with some samples and some knowledge of
context, someone could well offer a hypothesis.

Perhaps if you'd show the data in context (accompanied for example by
a hexadecimal dump of the bytes), someone could offer a suggestion
about what it is.
These wide characters just showed up in some records.

That's not a very definite description of symptoms, you know. I think
we could have guessed that for ourselves based on your previous
presentation. I for one was hoping to see something more definite in
the way of an exhibit.
There is an another problem.

My code passes extracted XML strings to an another application as counted
strings. It seems that the Perl length function returns an incorrect result
when these
"wide" characters are present.

I'd have to guess that the Perl length function returns what it's
documented to return, but that you're expecting something different.
Again, please pardon my ignorance

Lack of knowledge (ignorance) is NOT the issue here, and is a
perfectly normal and acceptable state of being, and (I think I can
speak for many another here) is one of the reasons why we come to
Usenet to share what we know. The *problem* is that you aren't
showing us any working, so we don't know exactly what you're trying,
we don't know exactly what results you are getting, we don't know what
you expected the answer to be, and so we can't really offer any
definite help.

If you haven't tried it yet I'd suggest
http://www.perldoc.com/perl5.8.0/pod/perluniintro.html
and then
http://www.perldoc.com/perl5.8.0/pod/perlunicode.html
with particular reference to #Byte-and-Character-Semantics

But most of all to
http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html

have fun
 
J

Jürgen Exner

Alan said:
On Mon, Aug 4, Yuri Shtil continued in TOFU style: [...]
The *problem* is that you aren't
showing us any working, so we don't know exactly what you're trying,
we don't know exactly what results you are getting, we don't know what
you expected the answer to be, and so we can't really offer any
definite help.

If you haven't tried it yet I'd suggest
http://www.perldoc.com/perl5.8.0/pod/perluniintro.html
and then
http://www.perldoc.com/perl5.8.0/pod/perlunicode.html
with particular reference to #Byte-and-Character-Semantics

But most of all to
http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html

I'd like to add http://www.catb.org/~esr/faqs/smart-questions.html to that
list.

jue
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,733
Messages
2,569,440
Members
44,830
Latest member
ZADIva7383

Latest Threads

Top