Text::Autoformat usage for a rookie

G

Gary Schenk

I try to use Perl in my work to solve repetitive tasks. I am
self-taught so far, with no mentor to badger with questions, so
I now badger the group. My apologies in advance.

I have an immense text file, with very long lines, sometimes over
900 characters long. I would like to format this so that the
lines are around 77 characters. I have been cutting out small
bits, reformatting the lines in Notepad, then export these
smaller text files into a CADD program. This is time consumming
and very tedious. An obvious job for Perl. I thought it would be
an easy regexp exercise, but keeping the words in one piece is
beyond my abilities right now. The Unix fmt command didn't help,
either.

I discovered Text::Autoformat. I have not been able to get it to
work. I get a usage error. I have Googled trying to find
examples of usage, and read up on it on CPAN, but I don't get
it. It seems as though people use it to format simple strings to
STDOUT. I want to format files.

Could someone point me in the right direction to getting this
code to work? I'm missing something terribly basic, I know. I
need a nudge in the right direction.

This is the latest version of the program:

#!/usr/bin/perl -w

use Text::Autoformat;

print "\n\nEnter a file to convert, or 'q' to quit: ";
chomp( $input = <STDIN> );

if ( $input ne 'q' ) {
open( INPUT, "<$input" ) or die( "Can't open $input for
reading: $!" );
open( OUTPUT, ">output.txt" ) or die( "Can't open output.txt:
$!" );

my $fixed = autoformat( <INPUT>, { left => 1, right => 77, all
=> 1 } );
print OUTPUT $fixed;
}

else {
print "Bye!\n";
}

The most recent error message:

Usage: autoformat([text],[{options}]) at ./nfmt.pl line 12

Thanks.
 
B

Ben Morrow

Quoth Gary Schenk said:
#!/usr/bin/perl -w

use warnings; # instead of -w
use strict;

Have you read the Posting Guidelines?
use Text::Autoformat;

print "\n\nEnter a file to convert, or 'q' to quit: ";
chomp( $input = <STDIN> );
^^
my

I would use ^D (EOF) (^Z on Win32) to signal 'quit' instead, as this is
more usual with programs like this; you can detect EOF because $input
will be undef. Also, I would check @ARGV for files before prompting, so
you can call it like

../script file1

if you want to.
if ( $input ne 'q' ) {

I would use 'unless' here, but you may find that more confusing.
open( INPUT, "<$input" ) or die( "Can't open $input for
reading: $!" );

You have checked the return of open: good.
You have included the file and $! in the error: good. :)
Use lexical FHs.
Use 3-arg open.
Don't use unnecessary parens.

open my $INPUT, '<', $input or die "...";
open( OUTPUT, ">output.txt" ) or die( "Can't open output.txt:
$!" );

This is generally a bad idea... you would be much better off (say)
renaming the old file to "$input~" and overwriting, or creating the new
file as "$input.fmt"; or simply processing STDIN to STDOUT and letting
the user redirect where he will.
my $fixed = autoformat( <INPUT>, { left => 1, right => 77, all
=> 1 } );

[I am going to assume at this point that Text::Autoformat::autoformat is
unprototyped... I don't have a copy of the module to hand to check. See
perlsub for prototypes, and type
perl -MText::Autoformat
-le'print prototype \&Text::Autoformat::autoformat'
to check I'm correct (it should print nothing).]

If it is, then its args will be evaluated in list context. Context is
one of the most important concepts in Perl, so try to get your head
around it :). The <> operator in list context returns a list of all the
lines in the file, where 'lines' are delimited with $/ which is "\n" as
you haven't changed it. So this will call autoformat like

autoformat(
"line one of file...\n",
"line two of file...\n",
...,
{ opts }
)

which does not match the ([text], [options]) it was expecting. You want
to get the whole file into a scalar; there are (at least) two ways of
doing this. The 'obvious' one is

$/ = undef; # use local for other than very small scripts
my $fixed = autoformat scalar <INPUT>, {...});

where $/ and the special meaning of undef is explained in perlvar, and
the 'scalar' forces scalar context on the <> (this isn't strictly
necessary in this case, as it will just return one 'line' consisting of
he whole file anyway, but I prefer to add it as documentation that we
are only getting one value back). A better way is to use Uri's
File::Slurp module, like this

use File::Slurp qw/read_file/;

my $text = read_file $input;
my $fixed = autoformat $text, {...};

Read the docs for how to handle errors: the default as above is probably
fine for small scripts. This is better because it is both faster
(read_file uses unbuffered IO where it can, which can make the reading
much faster) and cleaner (there's no need to open the file or mess
around with $/).

Ben
 
A

Anno Siegel

[...]
I have an immense text file, with very long lines, sometimes over
900 characters long. I would like to format this so that the
lines are around 77 characters.
[...]

I discovered Text::Autoformat. I have not been able to get it to
work. I get a usage error. I have Googled trying to find
examples of usage, and read up on it on CPAN, but I don't get
it. It seems as though people use it to format simple strings to
STDOUT. I want to format files.

That's the "minimal use" described in SYNOPSIS. What's the
problem?
Could someone point me in the right direction to getting this
code to work? I'm missing something terribly basic, I know. I
need a nudge in the right direction.

This is the latest version of the program:

#!/usr/bin/perl -w

No strict, no warnings. Bad.
use Text::Autoformat;

print "\n\nEnter a file to convert, or 'q' to quit: ";
chomp( $input = <STDIN> );

if ( $input ne 'q' ) {
open( INPUT, "<$input" ) or die( "Can't open $input for
reading: $!" );
open( OUTPUT, ">output.txt" ) or die( "Can't open output.txt:
$!" );

my $fixed = autoformat( <INPUT>, { left => 1, right => 77, all
=> 1 } );

autoformat() expects a single string $text as its first argument,
possibly followed by a hashref of options. <INPUT> expands to the
list of lines in the file, which is not a single string, confusing
autoformat().

Read the file into a variable

my $text = do { local $/; <INPUT> };

or use Uri's baby File::Slurp. Then call autoformat with that variable:

my $fixed = autoformat( $text, { left => 1, right => 77, all => 1});
print OUTPUT $fixed;
}

That should take you a step further.

Anno
 
G

Gary Schenk

[email protected] (Anno Siegel) wrote in message news: said:
No strict, no warnings. Bad.


autoformat() expects a single string $text as its first argument,
possibly followed by a hashref of options. <INPUT> expands to the
list of lines in the file, which is not a single string, confusing
autoformat().

Read the file into a variable

my $text = do { local $/; <INPUT> };

or use Uri's baby File::Slurp. Then call autoformat with that variable:

my $fixed = autoformat( $text, { left => 1, right => 77, all => 1});


That should take you a step further.

Anno


That did more than take me a step forward. It solved the problem. This
little program will save me at least a week's worth of work, maybe
more!

Thanks very much, Anno. Not just for the suggestion, but for helping
me learn something about Perl.

Gary
 
G

Gary Schenk

Ben Morrow said:
use warnings; # instead of -w
use strict;

Have you read the Posting Guidelines?

I have now.
^^
my

I would use ^D (EOF) (^Z on Win32) to signal 'quit' instead, as this is
more usual with programs like this; you can detect EOF because $input
will be undef. Also, I would check @ARGV for files before prompting, so
you can call it like

./script file1

if you want to.


I would use 'unless' here, but you may find that more confusing.

I find lots of Perl confusing. I have a simple mind, and try to keep
things simple.
You have checked the return of open: good.
You have included the file and $! in the error: good. :)
Use lexical FHs.
Use 3-arg open.
Don't use unnecessary parens.

open my $INPUT, '<', $input or die "...";

I have lots to learn about idioms in Perl. For example, in Anno's
reply, I still have not quite figured out how some of it works.
This is generally a bad idea... you would be much better off (say)
renaming the old file to "$input~" and overwriting, or creating the new
file as "$input.fmt"; or simply processing STDIN to STDOUT and letting
the user redirect where he will.

I have a need to keep the original file intact. However, that is
laziness on my part. ( A Perl virtue?) Why is this "generally a bad
idea"?


Thanks for the thoughtful reply. There is a lot here, and in Anno's
reply, for me to digest. It should keep me busy for sometime!

I find it intersting that the Perl in my class and in my books is
different from the Perl in this newsgroup. So much to learn, so little
time.

Gary
 
B

Ben Morrow

Quoth (e-mail address removed) (Gary Schenk):
I have a need to keep the original file intact. However, that is
laziness on my part. ( A Perl virtue?) Why is this "generally a bad
idea"?

Sorry, I was unclear... hardcoding an output filename is generally a bad
idea. It makes the program more confusing to use, and (for instance)
will clobber the old output if you run it twice.

Keeping the old file intact is fine; my recommendations, expressed a
little more verbosely, were:

1. rename the original file to "$input~", and write the new data to a
new file $input. This will leave the unformatted text in "$input~" and
the formatted data under the original filename. My choice of '~' is
standard on Unix systems; if you are on Win32 you may prefer '.bak'; if
elsewhere something else.

2. Choose soome suffix such as '.fmt' and append it to the input
filename to get the file to write the output to. This will leave the
input data where it was and put the output somewhere you can find it.

3. Process from STDIN to STDOUT. This means the program is invoked as

myfmt < input > output

and you can put the output where you like.

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,271
Latest member
BuyAtenaLabsCBD

Latest Threads

Top