Regex substitute w/ match variables

Discussion in 'Perl Misc' started by Gary sCHENK, May 5, 2005.

  1. Gary sCHENK

    Gary sCHENK Guest

    I am a self-taught at Perl. I use Perl a few times a year, mostly to
    process text files. I'm trying to rename files in a directory. My
    skills are quite rudimentary.

    The files are currently named like this: SR-01-234-5.jpg
    I want to rename them like this: SR-01-234-0005.jpg

    I have a couple of thousand of these. I've already written several
    several variations of the following script to get them to this stage,
    but adding the extra zeros has me stumped. This is the script:
    ===============================================================================
    #!perl -w

    opendir( DH, "d:\\temp" ) or die "couldn't open d:\\temp: $! ";
    while ( defined ( my $filename = readdir( DH ) ) ) {
    my $foo = $filename;
    if ($foo =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/ ) {
    if ( length( $2 ) == 1 ) {
    $foo =~ s/$1$2$3/$1000$2$3/;
    rename( $filename, $foo );
    #print "$1\n";
    }
    }
    }
    closedir( DH );

    ===============================================================================

    The print statement is an attempt at debugging. When I comment out the
    substitution and the call to rename and just print $1, the output is
    what I expect. When I run this script as shown above, however, files
    come up missing, or the zeros are added in the wrong place.

    Is it possible to use match variables in substitutions? The llama book
    shows match variables being used outside of regular expression
    operations, but not in this fashion.

    And why are the files being deleted? I'm really stumped, and would
    appreciate any and all help.

    All the best,
    Gary Schenk
    Gary sCHENK, May 5, 2005
    #1
    1. Advertising

  2. "Gary sCHENK" <> wrote in
    news::

    > I am a self-taught at Perl. I use Perl a few times a year, mostly to
    > process text files. I'm trying to rename files in a directory. My
    > skills are quite rudimentary.
    >
    > The files are currently named like this: SR-01-234-5.jpg
    > I want to rename them like this: SR-01-234-0005.jpg
    >
    > I have a couple of thousand of these. I've already written several
    > several variations of the following script to get them to this stage,
    > but adding the extra zeros has me stumped. This is the script:


    > #!perl -w


    use warnings;

    is better because it allows you to selectively turn warnings on/off. See

    perldoc warnings

    > opendir( DH, "d:\\temp" ) or die "couldn't open d:\\temp: $! ";


    Good.

    > while ( defined ( my $filename = readdir( DH ) ) ) {
    > my $foo = $filename;


    Completely unnecessary.

    > if ($foo =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/ ) {


    I think this is better written as:

    if ($foo =~ /^(SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg)$/ ) {

    > if ( length( $2 ) == 1 ) {
    > $foo =~ s/$1$2$3/$1000$2$3/;


    sprintf will work very nicely here:

    my $new = sprintf "$1%4.4d$3", $2;

    > And why are the files being deleted?


    From perldoc -f rename:

    rename OLDNAME,NEWNAME
    Changes the name of a file; an existing file NEWNAME will be
    clobbered.

    I would suggest skipping the rename if the new name is the same as the
    old name.

    Also, note perldoc -f readdir:

    If you're planning to filetest the return values out of a
    "readdir", you'd better prepend the directory in question.
    Otherwise, because we didn't "chdir" there, it would have been
    testing the wrong file.

    So, you should either chdir to the working directory, or prepend the
    directory name to each file name.

    Putting all of this together, here is a revised version of your script:

    #! /usr/bin/perl

    use strict;
    use warnings;

    use File::Spec::Functions 'catfile';

    my $dir = shift || $ENV{TMP};

    opendir my $dh, $dir
    or die "Error opening directory $dir: $! ";

    while( my $old = readdir $dh ) {
    if ($old =~ /^(SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg)$/ ) {
    my $new = sprintf "$1%4.4d$3", $2;

    if($new eq $old) {
    print "Skipping $old\n";
    next;
    }

    $old = catfile $dir, $old;
    $new = catfile $dir, $new;

    print "$old => $new\n";

    # rename $old, new
    # or warn "Error renaming $old to $new: $!";
    }
    }

    closedir $dh or die "Error closing directory $dir: $!";

    Sinan


    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, May 5, 2005
    #2
    1. Advertising

  3. Gary sCHENK

    Anno Siegel Guest

    Gary sCHENK <> wrote in comp.lang.perl.misc:
    > I am a self-taught at Perl. I use Perl a few times a year, mostly to
    > process text files. I'm trying to rename files in a directory. My
    > skills are quite rudimentary.
    >
    > The files are currently named like this: SR-01-234-5.jpg
    > I want to rename them like this: SR-01-234-0005.jpg
    >
    > I have a couple of thousand of these. I've already written several
    > several variations of the following script to get them to this stage,
    > but adding the extra zeros has me stumped. This is the script:
    > ===============================================================================
    > #!perl -w


    Why not strict? Your program seems to be written for it.

    > opendir( DH, "d:\\temp" ) or die "couldn't open d:\\temp: $! ";
    > while ( defined ( my $filename = readdir( DH ) ) ) {
    > my $foo = $filename;
    > if ($foo =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/ ) {


    Your regex is fine though slightly more general than your example. However,
    substitution with s/// isn't always the best way to turn a string into
    another. For formatting numbers, there is sprintf.

    > if ( length( $2 ) == 1 ) {
    > $foo =~ s/$1$2$3/$1000$2$3/;
    > rename( $filename, $foo );
    > #print "$1\n";
    > }
    > }
    > }
    > closedir( DH );
    >
    > ===============================================================================
    >
    > The print statement is an attempt at debugging. When I comment out the
    > substitution and the call to rename and just print $1, the output is
    > what I expect. When I run this script as shown above, however, files
    > come up missing, or the zeros are added in the wrong place.


    So why didn't you print out $foo for debugging? That way you'd have known
    what you are trying to rename your files to. You are probably renaming
    many files all to the same name. That's the same as deleting all but one
    of them.

    > Is it possible to use match variables in substitutions? The llama book
    > shows match variables being used outside of regular expression
    > operations, but not in this fashion.


    It's using them inside *another* regex that's problematic. Every regex
    evaluation resets them. You can assign the matches to named variables
    that don't have that problem (see below).

    Here's how I would do it (your regex is unchanged):

    my $filename = 'SR-01-234-5.jpg';
    my ( $pre, $num, $suf) =
    $filename =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/;
    my $foo = sprintf "%s%04d%s", $pre, $num, $suf;
    print "$filename -> $foo\n";

    Anno
    Anno Siegel, May 5, 2005
    #3
  4. Gary sCHENK

    Anno Siegel Guest

    A. Sinan Unur <> wrote in comp.lang.perl.misc:
    > "Gary sCHENK" <> wrote in
    > news::


    [Good advice]

    > if ($old =~ /^(SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg)$/ ) {
    > my $new = sprintf "$1%4.4d$3", $2;


    Just one note. It is generally a bad idea to put variable strings into
    a sprintf format. They could decide to contain a "%" one day. I realize
    the regex doesn't allow this in this case, but on principle I'd do

    sprintf '%s%4.4d%s', $1, $2, $3;

    Anno
    Anno Siegel, May 5, 2005
    #4
  5. Gary sCHENK

    Damian James Guest

    On 5 May 2005 14:06:26 -0700, Gary sCHENK said:
    > #!perl -w
    >
    > opendir( DH, "d:\\temp" ) or die "couldn't open d:\\temp: $! ";
    > while ( defined ( my $filename = readdir( DH ) ) ) {
    > my $foo = $filename;
    > if ($foo =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/ ) {
    > if ( length( $2 ) == 1 ) {
    > $foo =~ s/$1$2$3/$1000$2$3/;
    > rename( $filename, $foo );
    > #print "$1\n";
    > }
    > }
    > }
    > closedir( DH );
    > ...
    > Is it possible to use match variables in substitutions? The llama book
    > shows match variables being used outside of regular expression
    > operations, but not in this fashion.


    That substitution in the inner loop is doing rather differently than
    what you appear to be expecting. Looking at it...

    $foo =~ s/$1$2$3/$1000$2$3/;

    First, the pattern you are matching will be the contents of the
    matched strings from the previous pattern, not the pattern itself,
    and NOT including hte parentheses. So taking those strings, concatenated
    together, as a pattern, you are not in fact assigning anything to $1, $2 and
    $3 the second time. This does mean that they retain their previous values.
    The string you are substituting however, starts with the variable $1000,
    which is not populated. Doing "${1}000" instead should help, but I don't
    understand why you are using a substitution here at all. Why not just
    assign the result?

    Have you tried printing $foo? Try replacing the substitution with:

    $foo = "${1}000$2$3";

    > And why are the files being deleted? I'm really stumped, and would
    > appreciate any and all help.


    Well, $1000 is empty, thus "5a.jpg" or something like it
    has been the resulting string several times, so you're renaming
    multiple files to the same name? Couldn't say for sure without
    seeing your directory listing.

    NB, if I were doing this I'd probably have used glob() rather
    than opendir(). Also, perl even on win32 can understand normal
    slashes, so there's no need to the double-backwhacks. I'd still
    only put the path in once, though:

    my $path = 'd:/temp';
    my @files = glob( "$path/SR*.jpg" );

    ....or somesuch

    Hope this helps
    --damian
    Damian James, May 5, 2005
    #5
  6. -berlin.de (Anno Siegel) wrote in
    news:d5e4ub$j6s$-Berlin.DE:

    > A. Sinan Unur <> wrote in comp.lang.perl.misc:
    >> "Gary sCHENK" <> wrote in
    >> news::

    >
    > [Good advice]
    >
    >> if ($old =~ /^(SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg)$/ ) {
    >> my $new = sprintf "$1%4.4d$3", $2;

    >
    > Just one note. It is generally a bad idea to put variable strings
    > into a sprintf format. They could decide to contain a "%" one day. I
    > realize the regex doesn't allow this in this case, but on principle
    > I'd do
    >
    > sprintf '%s%4.4d%s', $1, $2, $3;


    Definitely, that was on my list of things to add, but forgot. Thanks for
    catching it.

    Sinan.

    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, May 5, 2005
    #6
  7. "A. Sinan Unur" <> wrote in
    news:Xns964DB3D428E50asu1cornelledu@127.0.0.1:

    Important correction:

    > while( my $old = readdir $dh ) {


    I edited out the crucial test for defined when I was changing things. This
    line should have been, as it was in the original post,

    while( defined (my $old = readdir $dh) ) {

    Sorry.

    Sinan.
    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, May 5, 2005
    #7
  8. Gary sCHENK

    Damian James Guest

    On 5 May 2005 21:49:17 GMT, Anno Siegel said:
    > ...
    > It's using them inside *another* regex that's problematic. Every regex
    > evaluation resets them. You can assign the matches to named variables
    > that don't have that problem (see below).


    Reset? My understanding was, previous matches are retained (which makes
    what the OP was trying to do more confusing, beacuse sometimes it may
    have succeeded). From perlre:

    NOTE: failed matches in Perl do not reset the match variables, which
    makes easier to write code that tests for a series of more specific
    cases and remembers the best match.

    > Here's how I would do it (your regex is unchanged):
    >
    > my $filename = 'SR-01-234-5.jpg';
    > my ( $pre, $num, $suf) =
    > $filename =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/;
    > my $foo = sprintf "%s%04d%s", $pre, $num, $suf;
    > print "$filename -> $foo\n";


    Indeed.

    --damian
    Damian James, May 5, 2005
    #8
  9. Gary sCHENK

    Anno Siegel Guest

    Damian James <> wrote in comp.lang.perl.misc:
    > On 5 May 2005 21:49:17 GMT, Anno Siegel said:
    > > ...
    > > It's using them inside *another* regex that's problematic. Every regex
    > > evaluation resets them. You can assign the matches to named variables
    > > that don't have that problem (see below).

    >
    > Reset? My understanding was, previous matches are retained (which makes
    > what the OP was trying to do more confusing, beacuse sometimes it may
    > have succeeded). From perlre:
    >
    > NOTE: failed matches in Perl do not reset the match variables, which
    > makes easier to write code that tests for a series of more specific
    > cases and remembers the best match.


    Yes, *failed* matches retain the values. A successful match resets
    them (even if it doesn't capture anything itself). Since the pattern
    /$1$2$3/ would match the original string ("." matching itself), at
    the time of substitution $1, $2 and $3 would be undefined.

    my $filename = 'SR-01-234-5.jpg';
    $filename =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/;
    {
    no warnings 'uninitialized';
    $filename =~ s/$1$2$3/$1$2$3/;
    }
    print "*$filename*\n";

    Anno
    Anno Siegel, May 5, 2005
    #9
  10. A. Sinan Unur <> wrote:
    > "A. Sinan Unur" <> wrote in
    > news:Xns964DB3D428E50asu1cornelledu@127.0.0.1:
    >
    > Important correction:
    >
    >> while( my $old = readdir $dh ) {

    >
    > I edited out the crucial test for defined when I was changing things.



    It actually isn't crucial at all.


    > This
    > line should have been, as it was in the original post,
    >
    > while( defined (my $old = readdir $dh) ) {



    perl -MO=Deparse -e 'while( my $old = readdir $dh ) { }'

    and

    perl -MO=Deparse -e 'while( defined (my $old = readdir $dh) ) { }'

    make the same output. :)


    If you leave out the defined(), perl will put it in for you.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, May 6, 2005
    #10
  11. Gary sCHENK

    Damian James Guest

    On 5 May 2005 22:36:35 GMT, Anno Siegel said:
    > ...
    > Yes, *failed* matches retain the values. A successful match resets
    > them (even if it doesn't capture anything itself). Since the pattern
    > /$1$2$3/ would match the original string ("." matching itself), at
    > the time of substitution $1, $2 and $3 would be undefined.
    >


    Ah, nifty.

    --damian
    Damian James, May 6, 2005
    #11
  12. Tad McClellan <> wrote in
    news::

    > A. Sinan Unur <> wrote:
    >> "A. Sinan Unur" <> wrote in
    >> news:Xns964DB3D428E50asu1cornelledu@127.0.0.1:
    >>
    >> Important correction:
    >>
    >>> while( my $old = readdir $dh ) {

    >>
    >> I edited out the crucial test for defined when I was changing things.

    >
    > It actually isn't crucial at all.


    Good to know.

    >> This line should have been, as it was in the original post,
    >>
    >> while( defined (my $old = readdir $dh) ) {

    >
    >
    > perl -MO=Deparse -e 'while( my $old = readdir $dh ) { }'
    >
    > and
    >
    > perl -MO=Deparse -e 'while( defined (my $old = readdir $dh) ) { }'
    >
    > make the same output. :)


    Hmmm. I thought the magic only applied to readline. I stand corrected.

    Thank you.

    Sinan

    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, May 6, 2005
    #12
  13. A. Sinan Unur <> wrote:
    > Tad McClellan <> wrote in
    > news::




    >> perl -MO=Deparse -e 'while( my $old = readdir $dh ) { }'
    >>
    >> and
    >>
    >> perl -MO=Deparse -e 'while( defined (my $old = readdir $dh) ) { }'
    >>
    >> make the same output. :)

    >
    > Hmmm. I thought the magic only applied to readline. I stand corrected.



    I used the perldiag description of the warning to figure
    out where it applies:

    =item Value of %s can be "0"; test with defined()

    (W misc) In a conditional expression, you used <HANDLE>, <*> (glob),
    C<each()>, or C<readdir()> as a boolean value.

    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, May 6, 2005
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hiwa
    Replies:
    0
    Views:
    636
  2. Replies:
    3
    Views:
    746
    Reedick, Andrew
    Jul 1, 2008
  3. Julia deSilva

    Result from a regex substitute

    Julia deSilva, Jul 29, 2003, in forum: Perl Misc
    Replies:
    5
    Views:
    88
    Julia deSilva
    Jul 29, 2003
  4. msciwoj
    Replies:
    3
    Views:
    137
  5. Replies:
    1
    Views:
    75
    Michael Winter
    Jan 10, 2006
Loading...

Share This Page