ActiveState Perl mangles text files

M

Mothra

I woudn't normally use this, but it seemed the easiest way to process my
"Blocked Senders" list in Outlook.

The code below, should be straightforward, but look at the output it
produces when you run it on a windows text file (created in notepad).
It seems to put blank spaces (or unprintable characters) in between each
original character. If I write it back out to a text file, it's even worse.

Anyone know (or care) why? I used to write AS Perl scripts a couple of
years ago, and I'm sure it never did this.

---------code---------

#!/perl
use strict;
use IO::File;

my $input=IO::File->new('blocked_senders.txt', 'r');
my @blocked_senders=<$input>;
$input->close;

foreach(@blocked_senders){
print $_ . "\n";
}

------output-----------
C:\> .\process_blocked_senders.pl
?_ s h e m e k a @ t e l i a . c o m

c a r l o s _ m o r r i s i f @ b e v . c o m

d i a m o n d . 8 9 1 8 . 5 4 3 5 2 @ m - t - n . c o m

d y n e @ y o u a r e i n v i t e d t o . c o m

f d i d w p @ f l u c h t r e a k t i o n . d e

f f t y a 0 u d j @ w i d o m a k e r . c o m

g a n n o n p a l m a @ w a n a d o o . f r

j o a n b o g a c k i @ w i n t e r h i g h l a n d . c o . u k

k r y s t a _ h i g g i n s @ t e k m a i l e r . c o m

k w l r k k f h l p @ 1 6 3 . n e t

l e o p m z t @ 1 6 3 . n e t

l z h r @ 2 4 h o r a s . c o m

m i c h a e l d a r t @ r o j n a m e . c o m

n o r e p l y @ t i c k l e - i n c . c o m

p n r w n p e m l @ m s n . c o m

s m a r t 0 s o l o m o n @ u s w e s t . n e t

t h a w n n a b o l e k @ a c r o b a t m a i l . c o m

t i m k l o t t o 2 @ n e t s c a p e . n e t
 
B

Ben Morrow

Quoth Mothra said:
I woudn't normally use this, but it seemed the easiest way to process my
"Blocked Senders" list in Outlook.

The code below, should be straightforward, but look at the output it
produces when you run it on a windows text file (created in notepad).
It seems to put blank spaces (or unprintable characters) in between each
original character. If I write it back out to a text file, it's even worse.

Anyone know (or care) why? I used to write AS Perl scripts a couple of
years ago, and I'm sure it never did this.

---------code---------

#!/perl

Somehow I doubt you have perl installed as c:/perl.exe. This should
either be

#!c:/perl/bin/perl

or simply

#!perl
use strict;
use IO::File;

my $input=IO::File->new('blocked_senders.txt', 'r');
my @blocked_senders=<$input>;
$input->close;

foreach(@blocked_senders){
print $_ . "\n";
}

------output-----------
C:\> .\process_blocked_senders.pl
?_ s h e m e k a @ t e l i a . c o m

This is a charset problem: notepad, as of win2k, create files in UTF16
by default. You need Perl 5.8; then you can do:

#!/usr/bin/perl

use strict;
use warnings;

my @blocked_senders = do {
open my $IN, '<:encoding(utf16)', 'blocked_senders.txt'
or die "can't open blocked_senders.txt: $!";
<$IN>;
};

print "$_\n" for @blocked_senders;

Note that if you want to write Notepad-compatible files you will have to
open them with '>:encoding(utf16)' as well.

Ben
 
M

Mothra

Ben said:
Somehow I doubt you have perl installed as c:/perl.exe. This should
either be

#!c:/perl/bin/perl

or simply

#!perl
And yet it still works! But the lines above worked too. I think AS
Perl only looks for the word perl in the first line./
This is a charset problem: notepad, as of win2k, create files in UTF16
by default. You need Perl 5.8; then you can do:
It's Perl 5.8.3 that I have.
Note that if you want to write Notepad-compatible files you will have to
open them with '>:encoding(utf16)' as well.
Being awkward now, but is there a way I do this with IO::File?
 
M

Mothra

Note that if you want to write Notepad-compatible files you will have to
Being awkward now, but is there a way I do this with IO::File?
Actually ignore that...
 
M

Mothra

Note that if you want to write Notepad-compatible files you will have to
Being awkward now, but is there a way I do this with IO::File?
Sorry - was having a 'senior moment' there.
 
B

Ben Morrow

Quoth Mothra said:
Being awkward now, but is there a way I do this with IO::File?

Why? Perl's lexical filehandles (the 'open my $IN' in my example) make
IO::File obsolete, AFAICS.

Yes, there are two ways:

my $IN = IO::File->new('filename', '<:encoding(utf16)');

my $IN = IO::File->new('filename', 'r');
binmode $IN, ':encoding(utf16)';

Ben
 
M

Matt Garrish

Mothra said:
And yet it still works! But the lines above worked too. I think AS
Perl only looks for the word perl in the first line./

Actually, shebang lines in Windows are *almost* entirely useless (but don't
hurt for portability). The use of perl to execute your .pl files is done
through file associations (a.k.a the registry). So long as the association
is correct for the file type, you can double click on your scripts to run
them (which, in my opinion, is a useless feature unless the script is fully
debugged). XP now integrates the file associations into calls from a command
prompt, so you can get away with just using the file name. In older
versions, you need perl in your path and make an explicit call to the
executable.

All that said, perl will still check for switches like -w and -T (the latter
won't work inside a script on Windows, though). And running scripts under
Apache is a whole other story... : )

Matt
 
J

Jim Keenan

Mothra said:
I woudn't normally use this, but it seemed the easiest way to process my
"Blocked Senders" list in Outlook.

The code below, should be straightforward, but look at the output it
produces when you run it on a windows text file (created in notepad).
It seems to put blank spaces (or unprintable characters) in between each
original character. If I write it back out to a text file, it's even worse.

Anyone know (or care) why? I used to write AS Perl scripts a couple of
years ago, and I'm sure it never did this.

---------code---------

#!/perl

Unless something has changed lately, I don't think the shebang line is
relevant in AS Perl. I have the perl executable in my path and call
'perl iofile.pl' from the command line.
use strict;
use IO::File;

my $input=IO::File->new('blocked_senders.txt', 'r');
my @blocked_senders=<$input>;
$input->close;

foreach(@blocked_senders){
print $_ . "\n";
}
Since you didn't supply 'blocked_senders.txt' I had to make some
assumptions as to how it looked. Assumptions: (1) Single '\n' at end
of line rather than two; (2) All lines flush against left margin
rather than starting with 2 wordspaces; (3) Eliminate '?_' garbage on
first e-mail address; (4) eliminate all wordspaces. In other words:

##### START ASSUMED SOURCE (blocked_senders.txt) #####
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
##### END ASSUMED SOURCE #####

Once I cleaned up the source, I ran your code, the only peculiarity I
noticed was that the output came out with line-returns rather than
one. Not surprising, because there was no 'chomp' in the 'foreach'
loop.

foreach(@blocked_senders) {
chomp;
print $_ . "\n";
}

So your Perl code does not appear to be the problem per se (unless I'm
missing something).

Jim Keenan
 
M

Mothra

Ben said:
Why? Perl's lexical filehandles (the 'open my $IN' in my example) make
IO::File obsolete, AFAICS.
Hmm... I always thought of it the other way round - IO::File did come
later after all. Using IO:File can be useful where you actually want
your file object to be restricted by scope (filehandles are
automatically global). So I've just gotten into the habit of always
using it.
 
B

Ben Morrow

Quoth Mothra said:
Hmm... I always thought of it the other way round - IO::File did come
later after all. Using IO:File can be useful where you actually want
your file object to be restricted by scope (filehandles are
automatically global). So I've just gotten into the habit of always
using it.

Really? I'm surprised...

The point about lexical FHs is that they *are* scoped. In my code:

my @lines = do {
open my $IN, '<', 'file' or die ...;
<$IN>
};

the FH is closed at the end of the scope.

Ben
 
M

Mothra

Hmm... I always thought of it the other way round - IO::File did come
Really? I'm surprised...

The point about lexical FHs is that they *are* scoped. In my code:

my @lines = do {
open my $IN, '<', 'file' or die ...;
<$IN>
};

the FH is closed at the end of the scope.

Ben

Then why would the first example (below) write to the file, but the
second won't?

This will write to the file "foobar.txt"
-----------------
use strict;
if ( 1 == 1 ) {
open(FILE, ">foobar.txt");
}

print FILE "Foo";
-----------------

.....whereas this will fail with: Global symbol "$file" requires explicit
package name at opening.pl line 8. Execution of opening.pl aborted due
to compilation errors.
-----------------
use strict;
use IO::File;
if ( 1 == 1 ) {
my $file = IO::File->new('foobar.txt', 'w');
}

print $file "Foo";
-----------------
 
M

Mothra

Jim said:
Unless something has changed lately, I don't think the shebang line is
relevant in AS Perl. I have the perl executable in my path and call
'perl iofile.pl' from the command line.



Since you didn't supply 'blocked_senders.txt' I had to make some
assumptions as to how it looked. Assumptions: (1) Single '\n' at end
of line rather than two; (2) All lines flush against left margin
rather than starting with 2 wordspaces; (3) Eliminate '?_' garbage on
first e-mail address; (4) eliminate all wordspaces. In other words:
I think that was actually the problem. The text file was created in
"Windows Notepad" by MS Outlook 2003 in a particular format, which is
where all the garbage is coming from. If I pasted the same text into a
file on a Unix machine, it all worked fine. I should have opened the
file and re-saved it in the correct format. Bloody awkward Notepad :-(
 
A

A. Sinan Unur

....

....

Then why would the first example (below) write to the file, but the
second won't?

Mothra, you are missing the crucial part of Ben's post about _lexical_
filehandles. Did you see the my $IN part after open in the line above?
This will write to the file "foobar.txt"
-----------------
use strict;
if ( 1 == 1 ) {
open(FILE, ">foobar.txt");
}

print FILE "Foo";
-----------------

Try this:

use strict;

if (1) {
open my $FILE, '>', 'foobar.txt' or die $!;
}

print $file "Foo";
 
M

Mothra

A. Sinan Unur said:
Mothra, you are missing the crucial part of Ben's post about _lexical_
filehandles. Did you see the my $IN part after open in the line above?

I did but didn't understand the significance of it. My bad.
 
M

Mothra

Ben said:
Really? I'm surprised...

The point about lexical FHs is that they *are* scoped. In my code:

my @lines = do {
open my $IN, '<', 'file' or die ...;
<$IN>
};

the FH is closed at the end of the scope.

Ben

Sorry I must have misunderstood your first response. I was taught on a
training course a couple of years ago that using IO::File->new was the
best method for opening files. I was given a few reasons, one of which
was the fact that you can scope your file objects. I honestly can't
remmeber any other reasons. I was shown other methods such as the
bareword method, which was what I thought you were referring to until I
read your post more carefully. Wasn't shown the lexical filehandles,
which I find surprising, as the course was a pretty good one.

From what you've written, it appears that I was given misleading
information there?
 
B

Ben Morrow

Quoth Mothra said:
Then why would the first example (below) write to the file, but the
second won't?

This will write to the file "foobar.txt"
-----------------
use strict;
if ( 1 == 1 ) {
open(FILE, ">foobar.txt");
}

print FILE "Foo";

I said '*lexical* FHs'. Not global bareword FHs. They are rather
different beasts. Try:

use strict;
{
open my $FILE, '>', 'foobar.txt' or die $!;
}

print $FILE 'Foo';

Ben
 
B

Ben Morrow

Quoth Mothra said:
Sorry I must have misunderstood your first response. I was taught on a
training course a couple of years ago that using IO::File->new was the
best method for opening files. I was given a few reasons, one of which
was the fact that you can scope your file objects. I honestly can't
remmeber any other reasons. I was shown other methods such as the
bareword method, which was what I thought you were referring to until I
read your post more carefully. Wasn't shown the lexical filehandles,
which I find surprising, as the course was a pretty good one.

From what you've written, it appears that I was given misleading
information there?

One of my reasons for the 'I'm surprised' is that I am fairly sure that
lexical FHs are considerably newer than IO::Handle. It is entirely
likely that they weren't invented when you took the course.

Checking perl56delta confirms my suspicion that lexical FHs were new in
5.6.0...

Ben
 
J

Jim Keenan

Mothra said:
The text file was created in
"Windows Notepad" by MS Outlook 2003 in a particular format, which is
where all the garbage is coming from. If I pasted the same text into a
file on a Unix machine, it all worked fine. I should have opened the
file and re-saved it in the correct format. Bloody awkward Notepad :-(

I think the crucial part is "by MS Outlook 2003". On my day job
Notepad is all I have to write code with, but I never have problems
transferring the files to other systems. So I would infer the problem
comes from what Outlook does (though I've never used Outlook myself,
so I can't be sure).

Jim Keenan
 
J

Joe Smith

Mothra said:
Sorry I must have misunderstood your first response. I was taught on a
training course a couple of years ago that using IO::File->new was the
best method for opening files.

With perl version 5.6.0 or later, the best method is to use the
three-argument open and lexical variables (or any variable whose value
is undef at the time of the open).

open my $inputhandle, '<', $filename or die "...";
open my $outputhandle, '>', ' this file has leading and trailing spaces ';
open my $loghandle, '>>', $logname or warn "open($logname) $!";

No more prepending "./" or appending "\0" for file names with blanks.
-Joe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,170
Latest member
Andrew1609
Top