Regexp constructed from command line arguments.

D

ddtl

Hello everybody,

I wrote a perl script, which is supposed to rename music files : you
pass it a directory name to use, regexp to search for
(let's call it searchexp) and a replacement regexp (let's call it
repexp).

It reads filenames from the directory, processes filenames by
replacing searchexp by repexp, and renames the files.

What i don't understand, is why sometimes you pass a searchexp that
matches filename, but a script won't replace it with repexp.

Here is a relevant problematic code snippet ($opts{'s'} holds
searchexp, and $opts{'r'} - repexp. $i and $n are variables
incremented
by the loop):

1) @entry = fileparse($dir_list[$i], @extensions);

2) # if extension is not .mp3, .ogg or similar - no need to process
3) if ($entry[2] eq "") {next;}

4) $name_before[$n] = "$entry[1]" . "$entry[0]" . "$entry[2]";
5) $entry[0] =~ s/\Q$opts{'s'}\E/\Q$opts{'r'}\E/g;
6) $name_after[$n] = "$entry[1]" . "$entry[0]" . "$entry[2]";

7) $n++;

For example, if i have one file named '00.mp3' in a directory,
searchexp is "[\d]{2}" and repexp is "a" (without surrounding
brackets,
of course) - after execution of a line
5, $entry[0] is still "00", as before (i checked (using debugger) that
$opts{'s'} and $opts{'r'} really had a values of "[\d]{2}" and "a",
respectively, and that value of "$entry[0]" didn't change after
execution
of a line 5).

What is wrong here? What is still stranger, is that sometimes script
does works as it should (for example, if filename is "_0_0.mp3",
searchexp "_" and repexp "a").

ddtl.
 
T

Tad McClellan

ddtl said:
you
pass it a directory name to use, regexp to search for

3) if ($entry[2] eq "") {next;}


I like to say that like this:

next unless length $entry[2];

'cause excessive punctuation can get in the way of reading it.

4) $name_before[$n] = "$entry[1]" . "$entry[0]" . "$entry[2]";


See the Perl FAQ:

What's wrong with always quoting "$vars"?

and join my anti-excessive-punctuation crusade :)


$name_before[$n] = "$entry[1]$entry[0]$entry[2]";

or, perhaps better:

$name_before[$n] = join '', @entry[1, 0, 2];

5) $entry[0] =~ s/\Q$opts{'s'}\E/\Q$opts{'r'}\E/g;
^^
^^

Why did you put that there?

What were you hoping that would do for you?

Do you _want_ to insert extra backslash charcters into
the replacement string?



You should probably be letting Perl do the indexing for you by
using push() instead of doing your own indexing into @entry.

push @entry, "$entry[1]$entry[0]$entry[2]"; # Look Ma! No $n needed!
 
D

ddtl

You are subjecting $opts{s} to quotemeta() you are therefore not
treating the contents of $opts{s} as a regexp to match but rather as a
plain string to match.

Now i understand a problem (didn't get book's explanation about
quotemeta() right) - saying \Q$opts{'s'}\E with "[\d]{2}" will
actually search for "\[d\]\{2\}"

..
You are also subjecting $opts{r} to quotemeta(). This makes no sense
at all - if $opts{r} contained any \W characters they would actually
appear backslashed in the output.

Yes, that was part of my misunderstanding about quotemeta().

Though still i couldn't get a script to do quite the right thing
(after removing all \Q and \E, of course): if a replacement string
is "0\1" (file name and searchexp is "00.mp3" and "([\d]{2})"),
a file is renamed to be "0\1.mp3" instead of "000.mp3", though
it works when used literally in code, for example:

$a="00";
$a=~s/([\d]{2})/0\1/g;
print "$a";

Here, i get "000" as an output.

What is another problem here? Apparently, expansion of a variable
containing a replacement string is used "as it is", without treating
metacharacters as such.

But why it is so - it is not happening with expansion of a variable
containing regexp?

ddtl.
 
D

ddtl



You should probably be letting Perl do the indexing for you by
using push() instead of doing your own indexing into @entry.

push @entry, "$entry[1]$entry[0]$entry[2]"; # Look Ma! No $n needed!

Thanks for advice - will take it into account. I am (was long ago,
actually) used to C, and still didn't learn "perl ways" - you couldn't
do it after only 4 days of learning and in your first program :).

ddtl.
 
T

Tad McClellan

$a=~s/([\d]{2})/0\1/g;


You should always enable warnings when developing Perl code!

$a (and $b) are poor choices of variable names, they are used
by sort() too, values could get mangled...

The pattern would be easier to read and understand if written as:

$x =~ s/(\d\d)/0$1/g;


print "$a";


See the Perl FAQ:

What's wrong with always quoting "$vars"?

What is another problem here?


You only get one round of interpolation, and you are using it to
fetch the value from the hash. You'd need an additional round
to interpolate it yet again.

But why it is so -


Because if you had:

$_ = '$100.00';
s/(\$1)/$1/g;

and Perl kept doing rounds of interpolation until there was
nothing left to interpolate, it would never finish. :)

So it only does one round.

The solution to your problem can be found in the Perl FAQ:

How can I expand variables in text strings?

containing regexp?


The variable you are speaking of DOES NOT contain a regex.

The replacement string is a (double quotish) _string_, not a regex.
 
B

Brian McCauley

The solution to your problem can be found in the Perl FAQ:

How can I expand variables in text strings?

See also numerous previous threads discussing this FAQ (do a google
groups search on the exact phrase) since the answer given in the FAQ
is not really very good.

A patch to replace it with, IMNSHO, a much better answer can be found:

http://www.wcl.bham.ac.uk/pub/bam/patches/perl/perlfaq4-scalar-interpolate-take-3.diff

The question/answer reads:

How can I expand/interpolate variables in text strings?

You can process a string through Perl's interpolation
engine like this:

$text = 'this has a $foo in it...\n ...and a $bar';
# Assume $text does not contain "\nEND\n"
chop ( $text = eval "<<END\n$text\nEND\n" );
die if $@;

This will not work, and for good reason, if $text is com-
ing form an tainted source. For explanation of how $text
could execute arbitrary Perl code see ``How do I expand
function calls in a string?'' in this section of the FAQ.

If $text is coming from a source external to the Perl
script (typically a file) and you would be content to
trust executable code from that source then you simply
make data from that source untainted. This is no more or
less dangerous than using "do()". For an explaination of
tainting see the perlsec manpage.

If you do not trust the source that much then you can
limit and launder the parts of the string that are passed
to eval() something like this:

$text =~ s/(\$\w+)/$1/eeg; # needed /ee, not /e

This still gives unrestricted access to your scalar vari-
ables. It is often better to use a hash:

%user_defs = (
foo => 23,
bar => 19,
);
$text =~ s/\$(\w+)/$user_defs{$1}/g;

For other variations on the theme of text templates see
the sprintf() function and numerous modules on CPAN.

However the OP's question was actually about s///. Whilst the FAQ
that Tad mentions is somewhat applicable I think the OP's exact
question appears often enough that is should qualify as a FAQ in it's
own right. Like most FAQs this one seems to come in bursts - it goes
unasked for months then is asked be several people in the course of a
couple of weeks (as it has been in the last couple of weeks).

A good previous thread in this FAQ please see the thread following message
<[email protected]>.

http://groups.google.com/[email protected]

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,772
Messages
2,569,593
Members
45,108
Latest member
AlbertEste
Top