file rename help

S

Shawn

Hello,

I am very novice as you will see in my Perl skills. I thought I'd write a
simple script to rename a list of files in a directory
based on some validation checks. The first checking if the file is empty
and the 2nd checking if there is the character /
in a certain position in the file. The script is below

My first problem when I was testing this was the script seeing the file with
data even though I had created a zero byte file.
So, regardless of the file size it found it with data. Which leads me to
believe I'm not checking the file size correctly.

The second problem being the renaming of the file. I am receiving a
insufficient arguments error which has stumped me.
In this case, I'm guessing maybe I have a error in the FILE_LIST?

Any suggestions would be greatly appreciated.

Thanks.

#!/usr/local/bin/perl5.6
use lib "lib";
use MIME::Lite;

open(FILE_LIST,"ls f* | grep -v txt|" ) || die "Cannot do \n" ;
while ($cfile = <FILE_LIST>) {
if ( -z "$cfile" ) {
system("/bin/mv $cfile $cfile.zero.txt");
print "$cfile has zero data\n";
}
else {
print "$cfile has data\n";
}

$charcount = system("cut -c9 $cfile |sort -u |grep '/'
|wc -l");
if ( $charcount > 0 ) {
chomp($csv_client_id = `cut -c1-6 $cfile |sort -u`);
print "character found for $cfile"\n" ;
system("/bin/mv $cfile
${cfile}.char.txt");

else {
system("/bin/mv $cfile
${cfile}.nochar.txt");
}
}
 
A

Andrew Hamm

Shawn said:
Hello,

I am very novice as you will see in my Perl skills.

Yes, there's a number of things you could improve, but I won't berate you
for that! Let's just deal with your current issues and trust that you will
learn in time to write nicer Perl code :)

However, before you go much further, it is HIGHLY recommended that you
start all scripts with:

#!/usr/bin/perl.....

use strict;
use warnings;
## no warnings 'uninitialized';

The 3rd commented-out line is a personal favourite of mine, but that's
because I am not afraid of undef values, and the warnings about undef
values usually irritate me far more than coding for them. However, I do
not recommend you use my 3rd line until you are far more experienced. In
fact, you would get a valuable clue about your problem from "use
warnings", and my 3rd line would promptly take away that valuable warning.
So - you've been .... warned. Get tough on yourself with the first two
lines, and maybe (only maybe) consider the 3rd line when you have a lot of
experience.
My first problem when I was testing this was the script seeing the
file with data even though I had created a zero byte file.
So, regardless of the file size it found it with data. Which leads
me to believe I'm not checking the file size correctly.

There are better ways to get a list of files, but telling you this won't
explain why you are getting a failure right now, so I'll let you get away
with the "ls f* | grep -v |"

When you get the name of the file with

$cfile = <FILE_LIST>

The name of the file will contain a newline character at the end of the
file. So you may think that $cfile contains something like "f01234" but
really it contains "f01234\n". You need to clean up that trailing newline,
and the usual way is:

while($cfile = <FILE_LIST>) {
chomp $cfile;

So - your filename is currently wrong, and thus the -z test returns a
failure code. Specifically it returns undef. The return value for an
accessible empty file is 1, and the return code for an accessible
non-empty file is 0. In an if( ) test, undef basically behaves as if it
was 0. Therefore it's going to the else part.

NOTE: with "use warnings", you'd get a warning that undef has been
returned, and that would be your clue I mentioned above.
The second problem being the renaming of the file. I am receiving a
insufficient arguments error which has stumped me.
In this case, I'm guessing maybe I have a error in the FILE_LIST?

I'm a bit stumped about this. I can't see where the "insufficient
arguments" might be comeing from; I expect that the /bin/mv is
complaining? Anyway, it's probably time to start discussing some of the
other things wrong with your first attempt; it will be easier in the long
run.

Instead of using the FILE_LIST handle, perl can do a lot of the work for
you:

while($cfile = <f*>) {
# no need to chomp; the <f*> creates a readable list of filenames
# and the names returned are "clean"

# skip filenames ending with .txt (my guess:)
next if $cfile =~ /\.txt$/;

instead of using system, you can use perl's rename function:

rename $cfile, "$cfile.zero.txt";

Instead of using the system( ) to invoke cut and sort and grep and wc, you
could do it in perl. Apart from the efficiencies of not having to invoke
other processes, there will other performance benefits since it's quite
likely that the search for a / will be found much earlier than reading the
entire file. Therefore some perl code can terminate the search earlier
than the entire cut command line you are using

NOTE: even as a piece of shell code, your command is wasting at least two
processes. I'd be inclined to write

cut -c9 filename | grep -c '/'

if i was writing this in shell script. It would be sufficient. The sort
does not help, and the use of wc is pointless when grep can also do a
count.

a perlish search method:

open IFD, "<", $cfile or die ......
my $found = 0;
while(<IFD>) {
if(substr($_, 8, 1) eq "/") {
$found = 1;
last; # break out of the loop
}
}
close IFD;

if($found) .....
chomp($csv_client_id = `cut -c1-6 $cfile |sort -u`);

I do not see where you are using the $csv_client_id, but I guess you are
building up a script. You should be able to extract this value in the loop
I have written above. If the csv_client_id could be anywhere in the file,
then perhaps the "last" command I have used in the loop should be removed.
system("/bin/mv $cfile ${cfile}.char.txt");
[SNIP]
system("/bin/mv $cfile ${cfile}.nochar.txt");

once again, use the built-in rename function.

Also, you do not need ${cfile}.nochar.txt when $cfile.nochar.txt will do.
It's conventional practice not to overdo the use of { } unless it's really
necessary, for example "${cfile}blahblah"
 
A

Andrew Hamm

Andrew said:
I'm a bit stumped about this. I can't see where the "insufficient
arguments" might be comeing from; I expect that the /bin/mv is
complaining?

Oh yes - I can see where the "insufficient arguments" is coming from, and
it is indeed coming from mv. The string you are passing to system is
effectively

/bin/mv f01234\n f01234\n.zero.txt"

so that's three lines of shell script:

/bin/mv f01234
f01234
..zero.txt

Clearly wrong. If you had used chomp to clean up the filename then the
system( ) call would have worked without errors.
 
P

Peter Sundstrom

Shawn said:
Hello,

I am very novice as you will see in my Perl skills.

You're not joking. Obviously your Unix scripting skills and programming
logic are also at a very novice level.

Your Perl script does 11 unnecessary forks. Very wasteful and inefficient.
Either write Perl or write a shell script, but don't end up wrapping a shell
script in Perl.

I thought I'd write a simple script to rename a list of files in a
directory
based on some validation checks. The first checking if the file is empty
and the 2nd checking if there is the character /
in a certain position in the file.

Given that a / is not a valid character for a Unix filename, then why are
you checking for it?
The script is below

#!/usr/local/bin/perl5.6
use lib "lib";
use MIME::Lite;

Why are you using MIME::Lite when you never reference it?
open(FILE_LIST,"ls f* | grep -v txt|" ) || die "Cannot do \n" ;

Perl has internal globbing. How useful is an error message that says
"Cannot do"

while ($cfile = <FILE_LIST>) {
if ( -z "$cfile" ) {

No need for quotes.
system("/bin/mv $cfile $cfile.zero.txt");

Perl has a rename function.
print "$cfile has zero data\n";
}
else {
print "$cfile has data\n";
}

$charcount = system("cut -c9 $cfile |sort -u |grep '/'
|wc -l");

Now the above line is just plain bizarre.

1. No need to fork off a whole bunch of external processes for this.
2. If you'd read the documentation for system, you'd realise the return
result is not want you expected.
3. No Unix filename is going to have a / in it.
4. Why are you sorting a single digit? Do you know that if you sort '5',
the sorted result is '5'?
if ( $charcount > 0 ) {

See comments about result results from system.
chomp($csv_client_id = `cut -c1-6 $cfile
|sort -u`);

No need for external processes to get the first six characters of a string.
What's this thing you have with sorting a single piece of data?
print "character found for $cfile"\n" ;

Well, you'll never get a / character in a file, so you'll never match the
condition.
system("/bin/mv $cfile
${cfile}.char.txt");

else {
system("/bin/mv $cfile
${cfile}.nochar.txt");
}
}

See rename comments.
 
J

Jim Keenan

Shawn said:
Hello,

I am very novice as you will see in my Perl skills. I thought I'd write a
simple script to rename a list of files in a directory
based on some validation checks. The first checking if the file is empty
and the 2nd checking if there is the character /
in a certain position in the file. The script is below

Other posters have commented on specific errors in your Perl code and on
the inefficiencies of mixing Perl with system calls. Since you
obviously do know something about the shell, you're not a complete
novice. Given that fact, you would benefit from systematically working
your way through an introductory level Perl textbook. For example,
Randal L Schwartz's "Learning Perl" (3rd ed., O'Reilly) covers many of
the topics you had trouble with in Chapters 11-13. If you do that, you
will quickly build on your pre-existing technical skills.

Jim Keenan
 
M

Michele Dondi

#!/usr/local/bin/perl5.6

use strict;
use warnings;

always!!
use lib "lib";

Why, BTW?
use MIME::Lite;

Why, BTW?
open(FILE_LIST,"ls f* | grep -v txt|" ) || die "Cannot do \n" ;

It's better to use lexical fhs. It's better to use low precedence
logical operators for flow control. It's better to give informative
error messages (at least include $!).

Said this, a shell script is a shell script and a Perl program is...
err, a Perl program! Each of them can be the best solution for a given
task, but there's little to no point at all in a perl script that
works like a shell script relying massively on external utilities.

Please note that I'm not saying one should *never* do that. Indeed in
some situations an open() like yours or a system() can be give you an
advantage, which can largely depend from one's subjective point of
view (e.g. less keystrokes to type!).

In this case Perl has both a glob() function, which even has a
shortcut (but many people think it's not a good practice to resort to
it) and a grep() one.

So in this case...
while ($cfile = <FILE_LIST>) {

....I'd use

for (grep !/txt/, <f*>) {

instead.
if ( -z "$cfile" ) {
system("/bin/mv $cfile $cfile.zero.txt");

You can use Perl's rename() here. (File::Copy can be preferrable in
some situations).
$charcount = system("cut -c9 $cfile |sort -u |grep '/'
|wc -l");

You do not want system() here: you want qx// instead, aka backticks
(but I don't like them! ;-)

Also, what if $cfile was empty and hence moved to "$cfile.zero.txt"?!?

However what you wrote above takes the 9th char from each line of
$cfile and sorts with -u, so that you either get exactly one or zero
line(s) with '/' (or better: being exactly '/') according to wether
there's at least one line with '/' in 9th position or not. Or am I
mistaken?

Perl code for the above _could_ be:

open my $fh, '<', $cfile or
die "Can't open `$cfile': $!\n";
my $charcount=0;
while (<$fh>) {
$charcount=1 and last if m|(?<=^.{8})/|;
}

(but I guess you really wanted something else!)
if ( $charcount > 0 ) {

You can use

if ($charcount) {

instead.
chomp($csv_client_id = `cut -c1-6 $cfile |sort -u`);

Perl's own code for this _could_ be:

open my $fh, '<', $cfile or
die "Can't open `$cfile': $!\n";
my %saw;
@csv_client_ids = grep !$saw{$_}++,
map { chomp; substr $_, 0, 6 } <$fh>;

Please note that I used an array instead of what you wrote because I
_think_ that what you wrote in the first place is not what you really
want.

Of course, since you read the contents of $cfile twice, then you'd
better open() it once and the either slurp it into an array all at
once (and then process it separately) if the file is small enough or
use two separate C<while> loops seek()ing in between them.

I think that it would be better for you to provide a description of
the task you're aiming at (or a suitably simplified version of it) and
we may hopefully provide a minimal hopefully working example script.


Michele
 
M

Michele Dondi

## no warnings 'uninitialized';

The 3rd commented-out line is a personal favourite of mine, but that's
because I am not afraid of undef values, and the warnings about undef
values usually irritate me far more than coding for them. However, I do

Indeed that is the single warning that most often needs to be disabled
out of good reason. But since it can still be useful, it should be
really disabled *locally* only in those situations in which you know
you could get it but you know you do not want to be warned as you
expected it.

Incidentally had I known you (and others, now I see) wrote this
followup, probably I would have not written mine. However the OP's
script is so full of bizarre constructs that I've not paid extreme
attention to each single one of them and so he will probably benefit
from many people supplying their own cmts.


Michele
 
M

Michele Dondi

$charcount = system("cut -c9 $cfile |sort -u |grep '/'
|wc -l");

Now the above line is just plain bizarre. [snip]
3. No Unix filename is going to have a / in it.

However bizarre that line (actually!) is, while indeed no unix
*filename* will have '/' in it, this doesn't mean the *file* won't as
well.


Michele
 
A

Andrew Hamm

Michele said:
Indeed that is the single warning that most often needs to be disabled
out of good reason. But since it can still be useful, it should be
really disabled *locally* only in those situations in which you know
you could get it but you know you do not want to be warned as you
expected it.

I've done a lot of SQL/4GL where (sql's)NULL is a very natural variable,
and also from C where I was a fan of exploiting the equivalence of
NULL/0/truth in conditionals, so if I screw up and needed "warnings
uninitialized" I would consider myself ready for the nursing home :) I
think it's a mistake akin to adding to customer names together and
expecting their account balance to pop out.

But, Shawn's experience shows that this warning is indeed valuable for a
lot of people. It would not have directly explained his problem, but it
could have made him think about a line of code that was failing.
Incidentally had I known you (and others, now I see) wrote this
followup, probably I would have not written mine. However the OP's
script is so full of bizarre constructs that I've not paid extreme
attention to each single one of them and so he will probably benefit
from many people supplying their own cm

yeah - the more the merrier. And everyone has been fairly nice too. Just
waiting for Shawn to say "thanks" to everyone ;-)
 
S

Shawn

Of course, thank you for your time and input. This is allot of info to take
in.

The warnings have been a great help. That along with a better understanding
of the usage of system. I think as Michele pointed out in another post the
separation
of the shell script and a Perl program is a obvious struggle that I have had
separating.
Another point that Andrew mentioned is this script is just the beginning and
building
to something bigger. Therefore the usage of the Mime and others.
 
S

Shawn

Oh yes - I can see where the "insufficient arguments" is coming from, and
it is indeed coming from mv. The string you are passing to system is
effectively

/bin/mv f01234\n f01234\n.zero.txt"

so that's three lines of shell script:

/bin/mv f01234
f01234
.zero.txt

Clearly wrong. If you had used chomp to clean up the filename then the
system( ) call would have worked without errors.

The output from the system call just wasn't something I had put into
consideration.
I obviously needed a better understanding of the system call for its usage.
 
S

Shawn

So in this case...
...I'd use

for (grep !/txt/, <f*>) {

I will give this a try. Just trying to find all files that start with a f
and don't have a txt extention.
instead.


You can use Perl's rename() here. (File::Copy can be preferrable in
some situations).


You do not want system() here: you want qx// instead, aka backticks
(but I don't like them! ;-)

Also, what if $cfile was empty and hence moved to "$cfile.zero.txt"?!?

However what you wrote above takes the 9th char from each line of
$cfile and sorts with -u, so that you either get exactly one or zero
line(s) with '/' (or better: being exactly '/') according to wether
there's at least one line with '/' in 9th position or not. Or am I
mistaken?

The purpose of the attempt to check if the 9th character is the '/', is
because it identifies the
file as belonging to one set of files or another. I think another post
pointed out a error that
is true. Instead of reading the entire file the first line should suffice.
If its not in the first line
then its not in the file at all.
Perl code for the above _could_ be:

open my $fh, '<', $cfile or
die "Can't open `$cfile': $!\n";
my $charcount=0;
while (<$fh>) {
$charcount=1 and last if m|(?<=^.{8})/|;
}

(but I guess you really wanted something else!)


You can use

if ($charcount) {

instead.


Perl's own code for this _could_ be:

open my $fh, '<', $cfile or
die "Can't open `$cfile': $!\n";
my %saw;
@csv_client_ids = grep !$saw{$_}++,
map { chomp; substr $_, 0, 6 } <$fh>;

Please note that I used an array instead of what you wrote because I
_think_ that what you wrote in the first place is not what you really
want.

Of course, since you read the contents of $cfile twice, then you'd
better open() it once and the either slurp it into an array all at
once (and then process it separately) if the file is small enough or
use two separate C<while> loops seek()ing in between them.

Again I didn't even consider that fact that I was opening the stinking file
twice.
I think that it would be better for you to provide a description of
the task you're aiming at (or a suitably simplified version of it) and
we may hopefully provide a minimal hopefully working example script.

Here is the task. I receive file(s) from business partners which come in 3
formats:
#1 - empty file and name it so
#2 - a file that contains '/' in the 9th position of the line and
therefore I need to
extract characters 1-6 and should use in the filename ex:
$cfile.chars1-6.txt (I hadn't gotten that far)
this is where the csv_client_ids was coming from. So, I was
just simplfying the
naming to $cfile.char.txt or $cfile.nochar.txt.
#3 - a file that doesn't contain the '/' in the 9th position and
therefore the extract characters
1-13 and use them in the file name $cfile.chars1-13.txt

It could be 1 file or up to 200 files at a time. Ultimately, this would
lead to emailing certain people
when different files are received. Plus, leaving it flexible enough should
a 4th or 5th format started coming.

Thanks,
Shawn
 
T

Tad McClellan

Shawn said:
The output from the system call just wasn't something I had put into
consideration.


That is fine, because the output from the system() call has not
been discussed anywhere in this thread that I remember.

It is certainly not related to the Nameless quote above
(please provide an attribution when you quote someone).

mv is likely to make _no_ output at all anyway...

I obviously needed a better understanding of the system call for its usage.


I think you need a better understanding of what Nameless was trying
to point out above.

There was nothing wrong with your system() call. It is
the *argument* to system() that is messed up.

Backtrack to where you construct that messed up argument
and figure out what it takes to un-mess it.
 
A

Anno Siegel

Shawn said:
I will give this a try. Just trying to find all files that start with a f
and don't have a txt extention.

A ".txt" extension? Then it should be

for ( grep !/\.txt$/, <f*> ) {

Anno
 
M

Michele Dondi

I will give this a try. Just trying to find all files that start with a f
and don't have a txt extention.

Not exactly: "files that start with an f and don't contain 'txt'". I
wrote it so because this is what you wrote in the first place.
The purpose of the attempt to check if the 9th character is the '/', is
because it identifies the
file as belonging to one set of files or another. I think another post
pointed out a error that
is true. Instead of reading the entire file the first line should suffice.
If its not in the first line
then its not in the file at all.

Then some code like

if ((my $line=<$fh>) =~ m|^.{8}/|) {
# ...
}

should do the job.
Again I didn't even consider that fact that I was opening the stinking file
twice.

Well, you were not _open()ing_ it directly. My cmt is referred to the
fact that I posted two somewhat self-contained code snippets that do
that, instead. But if you have to put them together than you'd better
qx/see above!/
Here is the task. I receive file(s) from business partners which come in 3
formats:
[snip rest]

This should do the job:


#!/usr/bin/perl

use strict;
use warnings;

sub ren;

my $new;
rename $_, ($new=ren $_) or
warn "Can't rename `$_' to `$new': $!\n"
for grep !/\.txt$/, <f*>;

sub ren {
local $_=shift;
return "$_.empty.txt" if -z;

open my $fh, '<', $_ or
die "Can't open `$_': $!\n";
my $ext;
for (my $line=<$fh>) {
chomp;
$ext=substr $_, 0,
m|^.{8}/| ? 6 : 13;
}
"$_.$ext.txt"
}

__END__


Michele
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top