Unwanted character "^@" in perl output

C

Captain 3-Putt

My output has a bunch of unwanted "^@" characters after every valid
character. It looks like:

^@F^@i^@r^@s^@t^@^M^@

I want it to look like:

First^M

vim's 'g', 'a' shows "<^@> 0, Hex 00, Octal 000"

It's just a null character?? How do I get rid of it?

Thanks!

Ken

----------------------------------

#!/usr/bin/perl
open(BBOARD_DOWN, "<$ARGV[0]") or die ("Couldn't open $ARGV[0]");
@file=<BBOARD_DOWN>;
close(BBOARD_DOWN);

$header = $file[0];
for ($i = 1; $i <= $#file; ++$i) {
$student[$i] = $file[$i];
}
# student records, one at a time
@field = split(/\t/, $header);
for ($i = 0; $i <= $#field; ++$i) {
print "$field[$i]\t";
}
for ($i = 1; $i <= $#file; ++$i) {
print $student[$i];
}
 
P

Peter J. Holzer

Captain said:
My output has a bunch of unwanted "^@" characters after every valid
character. It looks like:

^@F^@i^@r^@s^@t^@^M^@

Looks like there is a NUL character *before* every valid character, not
after.

Are the NUL characters in the file you are opening here?

If they are (and I suspect, they are, because I don't see where they
could come from), the file is probably in UCS-2BE or UTF-16BE (the
difference only matters if the file contains characters outside of the
BMP, like ancient Sumeric cuneiform), so you should open it with the
appropriate input layer:

open(BBOARD_DOWN, "<:encoding(UTF-16BE)", $ARGV[0]) or die ("Couldn't open $ARGV[0]");

This will convert into perl's internal string representation. You should
also tell perl to which charset it should convert when printing to
stdout (by default it will assume iso-8859-1, but switch to UTF-8 for
strings containing characters beyond U+00FF). E.g.,

binmode STDOUT, ":encoding(cp1252)";

if you want the Western Windows charset (which Microsoft likes to call
"ANSI" for reasons only known to them).

hp

PS: Does anybody know of a portable way to get the default charset from
the environment? I find hardcoding charsets in scripts extremely
unelegant.
 
P

Peter J. Holzer

Todd said:
<a bunch of interesting encoding voodoo>

Hi Peter,

Where did you learn all this stuff?

On Usenet :)

No, that's not entirely true, of course, but discussions in usenet
groups and mailing lists are where I learn most because they always
make me realize what I don't know, so I reread the documentation, write
small test scripts to prove (or disprove) some aspect, google for
missing bits, etc. So the real answer is: From all kinds of sources.

Since I've been programming for almost 23 years and discussing about
it on usenet for about 18 years now and character sets have been a
constant source of problems during all this time, I've accrued a bit of
knowledge about this topic (and I'm still learning something new all the
time).
I'm getting in to doing work with international content. Though I'm limping
along, I couldn't, for example, discuss it intellegently.

I'd like to find some hardcopy for the bookshelf. Those that provide detail
in relation to perl, apache, relational databases, and web browsers would be
the most useful to me. Any suggestions?

No, sorry, I don't know any book which covers those topics. But then I
don't read many (non-fiction) books. As reference material I prefer
original documentation and standards, and if I'm looking for some
specific information, it is much faster to search the web than order
and read half a dozen books which may or may not contain it.

hp
 
A

Alan J. Flavell

On Usenet :)
;-)

Since I've been programming for almost 23 years and discussing about
it on usenet for about 18 years now and character sets have been a
constant source of problems during all this time, I've accrued a bit
of knowledge about this topic (and I'm still learning something new
all the time).

Until these recent rounds of unicode-ification of Perl, most of my
encounters with the character coding issue have been in the context of
HTML (or of usenet discussions of HTML, which can actually make the
problem even more difficult and confusing - especially when goo-groups
sees fit to intervene and parse their strange characters in creative
ways).

I've come to the conclusion, over the years, that the most difficult
"cases" are people who already believe that they have a firm grasp of
the principles (when in fact they haven't), and who demand a simple
answer to what they consider to be a simple question. The reality is
that their "simple question" reveals that they actually need to
un-learn substantial parts of what they had previously been taking for
granted, and start again. But convincing them of that is hard.
Doing it both diplomatically *and* effectively is especially hard.

Those who confidently know what they mean by the term "character set"
can be particularly stubborn. This is compounded by the fact that the
MIME attribute called "charset=" specifies what we would nowadays call
a "character encoding scheme" (such as utf-8), *NOT* a coded character
set.
No, sorry, I don't know any book which covers those topics.

Chapter 2 of the Unicode specification (available online in PDF)
is quite readable, considering the nature if its content: it puts
the terminology into the context of Unicode itself, and shows the
layering in terms of assigning to the characters of a character
repertoire, non-negative integer values to form a "coded character
set", and from there to define a "character encoding form" and thence
a "character encoding scheme". However, it isn't so very informative
in terms of how these differentiated terms work when applied to legacy
encodings such as us-ascii, iso-8859-1 etc. where the distinction
between the the terms is less evident.
As reference material I prefer original documentation and standards,
and if I'm looking for some specific information, it is much faster
to search the web than order and read half a dozen books which may
or may not contain it.

Indeed, but the web is unfortunately also awash with unreliable
"information", from people whose enthusiasm to share their discoveries
exceeds their technical competence. For example: don't get me started
on "symbol fonts in HTML" :-{{

hope these comments help a bit.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top