file renamer... request feedback

M

Matt Garrish

So what your saying is that after the first substitution in global context
s///g, the char position pointer has no notion of beggining of line?

Well, I think it could be written that /^/ is still valid after every
substitution.

Or are you saying the documentation says /^/ is *ONLY* valid before the
*FIRST* substitution in global context?

Why do keep writing gibberish?

What you fail to understand in this case is that the string position is not
the issue; you'd also have to tell the regex engine to drop the string it
was operating on and instead use the just-modified string. I'd love to see
you write any kind of meaningful regular expression when you have to
consider what the string will look like after every substitution, knowing
that what you just substituted in may get operated on again.

But then, that's why you're not a very good programmer...

Matt
 
R

robic0

Why do keep writing gibberish?

What you fail to understand in this case is that the string position is not
the issue; you'd also have to tell the regex engine to drop the string it
was operating on and instead use the just-modified string. I'd love to see
you write any kind of meaningful regular expression when you have to
consider what the string will look like after every substitution, knowing
that what you just substituted in may get operated on again.
Again, the regex is eval'd every loop through with the s///g. If you think
parts of the regex becomes invald after the first pass, please let all of us know.\
We would like to re-think the concept of what a regex is and what parts become
invalid over the formal execution of it withing the engine.
Be specific. You just said that s/^//g is not valid.
My regex is "^". Its a valid regex. Does s/$//g become invalid as well?
Since you may have designed the regex engine, can you tell me what regex is
valid when I write regex, so that I do *NOT* write invalid regex?
I can understand a regex not working, but I need to know what documentation
is invalid now that you have defined apparently regex that is invald, or
not considered. Because if you know the regex engine well enough to know that
a regex I write works but is invalid, I would like to know about it.
 
R

robic0

Why do keep writing gibberish?

What you fail to understand in this case is that the string position is not
the issue; you'd also have to tell the regex engine to drop the string it
was operating on and instead use the just-modified string. I'd love to see
you write any kind of meaningful regular expression when you have to
consider what the string will look like after every substitution, knowing
that what you just substituted in may get operated on again.

I got to follow this again. All you say is valid in the global operation.
But I would find it hard to be true on boundry conditions. To say what you
just said, because of substitution, invalidated ^$. In that case the position
becomes invalid because the end is invalid. If you can't validate the end
every time, you cannot validate the beginning ever when the possibility exists
the substitution takes away leading characters.

Make sence?

robic0
 
R

robic0

I got to follow this again. All you say is valid in the global operation.
But I would find it hard to be true on boundry conditions. To say what you
just said, because of substitution, invalidated ^$. In that case the position
becomes invalid because the end is invalid. If you can't validate the end
every time, you cannot validate the beginning ever when the possibility exists
the substitution takes away leading characters.

Make sence?

robic0
Otherwise the only validity of regex in a global sence is a floating start and
pseudo end within the regex engine, with the position being the start anchor
and *NO* end context validity.
If thats it, then there's been conceptual errors from the genius Perl writers.
Another nail in the coffen...

robic0
 
T

TOC

ok... I added the ability for the script to append a "_" to the end of the
file / dir (but before the extension, if one exists.)

I am sure my code sucks, can you smart people tell me how to do it right?


=)

I am a total newbie, so if there is something obvious, tell me... I won't
be offended.



#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
use Fatal;
use File::Find;

finddepth(\&f, ".");
sub f
{
return if /^\./ or /lost\+found/;
my $o = $_ ;
y/A-Z/a-z/;
y/a-z0-9._/_/c;
s/^_+//;
s/\.mpeg$/.mpg/;
s/\.ram$/.rm/;
s/\.qt$/.mov/;
s/\.jpeg$/.jpg/;
y/_//s;
y/.//s;
s/_\././g;
s/\._/_/g;
if ($_ ne $o)
{
while (-e $_)
{
print "\n $File::Find::dir/$_ already exists \n";
if (/\./)
{
s/^(.*)\.(.*)$/$1_.$2/;
}
else
{
s/(.*)/$1_/;
}
}
if (rename $o, $_) {print "\n $File::Find::name -> \n
$File::Find::dir/$_\n";}
else {warn "\n failed to rename $o to $_, left as $o
\n ";}
}
}
 
T

TOC

ok... I added the ability for the script to append a "_" to the end of
the file / dir (but before the extension, if one exists.)


what I mean is it now appends a "_" to the end of the filename if the
target filename already exists.
 
T

Tad McClellan

TOC said:
I am a total newbie, so if there is something obvious, tell me... I won't
be offended.

sub f


One-character subroutine names suck.

You should try to choose meaningful names.

my $o = $_ ;


One-character variable names suck too.

You should try to choose meaningful names.

y/A-Z/a-z/;


$_ = lc $_; # respects locales, y/// does not

s/(.*)/$1_/;


Tacking a character onto the end should *look like* you are
tacking a character onto the end:

$_ .= '_';
 
D

Dr.Ruud

Tad McClellan schreef:
Tacking a character onto the end should *look like* you are
tacking a character onto the end:

$_ .= '_';

Alternative: s/$/_/;

(but I like your concat better)
 
T

TOC

Tad McClellan schreef:


Alternative: s/$/_/;

(but I like your concat better)



thanks... eventually I'll make the variable names longer, and add
comments.



#!/usr/bin/perl
use strict;
use warnings;
use Fatal;
use File::Find;

finddepth( \&f, "." );

sub f
{
return if /^\./ or /lost\+found/;
my $o = $_;
$_ = lc $_;
y/a-z0-9._/_/c;
s/^_+//;
s/\.mpeg$/.mpg/;
s/\.ram$/.rm/;
s/\.qt$/.mov/;
s/\.jpeg$/.jpg/;
y/_//s;
y/.//s;
s/_\././g;
s/\._/_/g;

if ( $_ ne $o )
{
while ( -e $_ )
{
print "\n $File::Find::dir/$_ already exists \n";
if (/\./)
{
s/^(.*)\.(.*)$/$1_.$2/
; # insert an "_" between the filename and it's
extension
}
else
{
$_ .= '_'; # append an "_"
}
}
if ( $ARGV[0] ) # do a dry-run
{
print "\n $File::Find::name -> \n $File::Find::dir/$_\n";
}
elsif ( rename $o, $_ )
{
print "\n $File::Find::name -> \n $File::Find::dir/$_\n";
}
else
{
warn "\n failed to rename $o to $_, left as $o\n ";
}
}
}
 
D

Dr.Ruud

TOC schreef:
my $o = $_;
$_ = lc $_;

Can be combined: my $o = lc;
(see `perldoc -f lc`)

s/^(.*)\.(.*)$/$1_.$2/

No need for the second capture group:

s/^(.*)\./$1_./

No need even for the first:

s/\.(?=[^.]*$)/_./

Test: echo 'abc.def.ghi' | perl -pe 'chomp; s/\.(?=[^.]*$)/_./'

(I haven't benchmarked them, but it feels 'just good' to avoid
capturing.)
 
A

Anno Siegel

Dr.Ruud said:
TOC schreef:
my $o = $_;
$_ = lc $_;

Can be combined: my $o = lc;
(see `perldoc -f lc`)

s/^(.*)\.(.*)$/$1_.$2/

No need for the second capture group:

s/^(.*)\./$1_./

No need even for the first:

s/\.(?=[^.]*$)/_./

Test: echo 'abc.def.ghi' | perl -pe 'chomp; s/\.(?=[^.]*$)/_./'

(I haven't benchmarked them, but it feels 'just good' to avoid
capturing.)

Your intuition is right on the mark there. It isn't benchmarks that speak
for or against one alternative or the other but conceptual simplicity or,
almost equivalently, readability and general neatness.

From that point of view, I'd draw the line at your first alternative. It
expresses directly what is going to happen. The lookahead variant is
harder to understand.

If timing is really the issue, that's when readability and neatness go
overboard and we'll use what screams, to use one of Larry Wall's favorite
expressions. But that applies only locally to hot spots found by profiling,
never to the style of a whole program, and never before the whole program
works correctly.

Anno
 
D

Dr.Ruud

Anno Siegel schreef:
Dr.Ruud:
TOC:
s/^(.*)\.(.*)$/$1_.$2/

No need for the second capture group:

s/^(.*)\./$1_./

No need even for the first:

s/\.(?=[^.]*$)/_./

Test: echo 'abc.def.ghi' | perl -pe 'chomp; s/\.(?=[^.]*$)/_./'

(I haven't benchmarked them, but it feels 'just good' to avoid
capturing.)

Your intuition is right on the mark there. It isn't benchmarks that
speak for or against one alternative or the other but conceptual
simplicity or, almost equivalently, readability and general neatness.

From that point of view, I'd draw the line at your first alternative.
It expresses directly what is going to happen. The lookahead variant
is harder to understand.

I was tempted for a second to leave out the '^' from that first
alternative, and to use $& for the matched dot in the second (since its
penalty is rather small), but both felt just too pedantic.
:)

For me, the lookahead variant is not harder to understand, so I use it
freely in my code.

If timing is really the issue, that's when readability and neatness go
overboard and we'll use what screams, to use one of Larry Wall's
favorite expressions. But that applies only locally to hot spots
found by profiling, never to the style of a whole program, and never
before the whole program works correctly.

I fully agree.

Both alternatives could use a remark like
# replace the very last <dot> by <underscore><dot>
to show what was meant to achieve. With many regular expressions it is
good to simply mention what they are supposed to do.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top