File handling and regex

Discussion in 'Perl Misc' started by Luca Villa, Nov 5, 2007.

  1. Luca Villa

    Luca Villa Guest

    Hi all!

    I need help with Perl under Windows command-line to solve the
    following task:

    I have many disordered txt files and subdirectories under the root
    directory "c:\dir", like this:
    c:\dir\foobar.txt
    c:\dir\popo.txt
    c:\dir\sub1\agsds.txt
    c:\dir\sub1\popo.txt
    c:\dir\sub2\hghghg.txt
    c:\dir\sub2\subbb\abc.txt

    These txt files are of three types:
    type1: those that contain a string definable by the regular expression
    "abc[0-9]+def"
    type2: those that contain a string definable by the regular expression
    "lmn[0-9]+opq"
    type3: those that contain a string definable by the regular expression
    "rst[0-9]+uvw"

    I would to copy with a Perl Windows command-line script all these txt
    files into a single directory "c:\output" with the filename composed
    by the number found in the regex match (the "[0-9]+" part of the
    regex) and a "-type1.txt" or "-type2.txt" or "-type3.txt" suffix
    depending of what of the three above regexes are found in the file,
    obtaining a result looking like this:
    c:\output\15-type2.txt
    c:\output\102-type1.txt
    c:\output\33-type1.txt
    c:\output\49-type3.txt
    c:\output\4-type1.txt
    c:\output\335-type2.txt
    c:\output\32-type3.txt

    How can I do it?
     
    Luca Villa, Nov 5, 2007
    #1
    1. Advertising

  2. Luca Villa wrote:
    >
    > I need help with Perl under Windows command-line to solve the
    > following task:
    >
    > I have many disordered txt files and subdirectories under the root
    > directory "c:\dir", like this:
    > c:\dir\foobar.txt
    > c:\dir\popo.txt
    > c:\dir\sub1\agsds.txt
    > c:\dir\sub1\popo.txt
    > c:\dir\sub2\hghghg.txt
    > c:\dir\sub2\subbb\abc.txt
    >
    > These txt files are of three types:
    > type1: those that contain a string definable by the regular expression
    > "abc[0-9]+def"
    > type2: those that contain a string definable by the regular expression
    > "lmn[0-9]+opq"
    > type3: those that contain a string definable by the regular expression
    > "rst[0-9]+uvw"
    >
    > I would to copy with a Perl Windows command-line script all these txt
    > files into a single directory "c:\output" with the filename composed
    > by the number found in the regex match (the "[0-9]+" part of the
    > regex) and a "-type1.txt" or "-type2.txt" or "-type3.txt" suffix
    > depending of what of the three above regexes are found in the file,
    > obtaining a result looking like this:
    > c:\output\15-type2.txt
    > c:\output\102-type1.txt
    > c:\output\33-type1.txt
    > c:\output\49-type3.txt
    > c:\output\4-type1.txt
    > c:\output\335-type2.txt
    > c:\output\32-type3.txt
    >
    > How can I do it?


    *UNTESTED* YMMV :)

    #!/usr/bin/perl
    use warnings;
    use strict;
    use File::Find;
    use File::Copy;

    my $from = 'c:/dir';
    my $to = 'c:/output';

    my %trans = qw(
    abc(\d+)def type1
    lmn(\d+)opq type2
    rst(\d+)uvw type3
    );

    find sub {
    return unless open my $fh, '<', $_;
    return unless -f $fh;
    read $fh, my $data, -s _;
    close $fh;
    for my $pat ( keys %trans ) {
    next unless $data =~ $pat;
    copy $File::Find::name, "$to/$1-$trans{$pat}.txt";
    last;
    }
    }, $from;

    __END__



    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Nov 5, 2007
    #2
    1. Advertising

  3. Luca Villa

    jordilin Guest

    On Nov 5, 5:52 pm, "John W. Krahn" <> wrote:
    > Luca Villa wrote:
    >
    > > I need help with Perl under Windows command-line to solve the
    > > following task:

    >
    > > I have many disordered txt files and subdirectories under the root
    > > directory "c:\dir", like this:
    > > c:\dir\foobar.txt
    > > c:\dir\popo.txt
    > > c:\dir\sub1\agsds.txt
    > > c:\dir\sub1\popo.txt
    > > c:\dir\sub2\hghghg.txt
    > > c:\dir\sub2\subbb\abc.txt

    >
    > > These txt files are of three types:
    > > type1: those that contain a string definable by the regular expression
    > > "abc[0-9]+def"
    > > type2: those that contain a string definable by the regular expression
    > > "lmn[0-9]+opq"
    > > type3: those that contain a string definable by the regular expression
    > > "rst[0-9]+uvw"

    >
    > > I would to copy with a Perl Windows command-line script all these txt
    > > files into a single directory "c:\output" with the filename composed
    > > by the number found in the regex match (the "[0-9]+" part of the
    > > regex) and a "-type1.txt" or "-type2.txt" or "-type3.txt" suffix
    > > depending of what of the three above regexes are found in the file,
    > > obtaining a result looking like this:
    > > c:\output\15-type2.txt
    > > c:\output\102-type1.txt
    > > c:\output\33-type1.txt
    > > c:\output\49-type3.txt
    > > c:\output\4-type1.txt
    > > c:\output\335-type2.txt
    > > c:\output\32-type3.txt

    >
    > > How can I do it?

    >
    > *UNTESTED* YMMV :)
    >
    > #!/usr/bin/perl
    > use warnings;
    > use strict;
    > use File::Find;
    > use File::Copy;
    >
    > my $from = 'c:/dir';
    > my $to = 'c:/output';
    >
    > my %trans = qw(
    > abc(\d+)def type1
    > lmn(\d+)opq type2
    > rst(\d+)uvw type3
    > );
    >
    > find sub {
    > return unless open my $fh, '<', $_;
    > return unless -f $fh;
    > read $fh, my $data, -s _;
    > close $fh;
    > for my $pat ( keys %trans ) {
    > next unless $data =~ $pat;
    > copy $File::Find::name, "$to/$1-$trans{$pat}.txt";
    > last;
    > }
    > }, $from;
    >
    > __END__
    >
    > John
    > --
    > use Perl;
    > program
    > fulfillment


    One doubt,
    when you write
    read $fh, my $data, -s _;
    should not be
    read $fh, my $data, -s $_;

    I have searched along the web without success. I don't know if _
    equals $_ in this particular case
    best regards,
    jordi
     
    jordilin, Nov 6, 2007
    #3
  4. jordilin wrote:
    > On Nov 5, 5:52 pm, "John W. Krahn" <> wrote:
    >
    >>Luca Villa wrote:
    >>
    >>
    >>>I need help with Perl under Windows command-line to solve the
    >>>following task:

    >>
    >>>I have many disordered txt files and subdirectories under the root
    >>>directory "c:\dir", like this:
    >>>c:\dir\foobar.txt
    >>>c:\dir\popo.txt
    >>>c:\dir\sub1\agsds.txt
    >>>c:\dir\sub1\popo.txt
    >>>c:\dir\sub2\hghghg.txt
    >>>c:\dir\sub2\subbb\abc.txt

    >>
    >>>These txt files are of three types:
    >>>type1: those that contain a string definable by the regular expression
    >>>"abc[0-9]+def"
    >>>type2: those that contain a string definable by the regular expression
    >>>"lmn[0-9]+opq"
    >>>type3: those that contain a string definable by the regular expression
    >>>"rst[0-9]+uvw"

    >>
    >>>I would to copy with a Perl Windows command-line script all these txt
    >>>files into a single directory "c:\output" with the filename composed
    >>>by the number found in the regex match (the "[0-9]+" part of the
    >>>regex) and a "-type1.txt" or "-type2.txt" or "-type3.txt" suffix
    >>>depending of what of the three above regexes are found in the file,
    >>>obtaining a result looking like this:
    >>>c:\output\15-type2.txt
    >>>c:\output\102-type1.txt
    >>>c:\output\33-type1.txt
    >>>c:\output\49-type3.txt
    >>>c:\output\4-type1.txt
    >>>c:\output\335-type2.txt
    >>>c:\output\32-type3.txt

    >>
    >>>How can I do it?

    >>
    >>*UNTESTED* YMMV :)
    >>
    >>#!/usr/bin/perl
    >>use warnings;
    >>use strict;
    >>use File::Find;
    >>use File::Copy;
    >>
    >>my $from = 'c:/dir';
    >>my $to = 'c:/output';
    >>
    >>my %trans = qw(
    >> abc(\d+)def type1
    >> lmn(\d+)opq type2
    >> rst(\d+)uvw type3
    >> );
    >>
    >>find sub {
    >> return unless open my $fh, '<', $_;
    >> return unless -f $fh;
    >> read $fh, my $data, -s _;
    >> close $fh;
    >> for my $pat ( keys %trans ) {
    >> next unless $data =~ $pat;
    >> copy $File::Find::name, "$to/$1-$trans{$pat}.txt";
    >> last;
    >> }
    >> }, $from;
    >>
    >>__END__
    >>
    >>John
    >>--
    >>use Perl;
    >>program
    >>fulfillment

    >
    >
    > One doubt,
    > when you write
    > read $fh, my $data, -s _;
    > should not be
    > read $fh, my $data, -s $_;
    >
    > I have searched along the web without success. I don't know if _
    > equals $_ in this particular case


    No, it doesn't, at least not "literally" or conceptually.
    "_" is the special filehandle which refers to the filehandle used in the
    most recently used stat operation:

    "If any of the file tests (or either the "stat" or "lstat" operators)
    are given the special filehandle consisting of a solitary underline,
    then the stat structure of the previous file test (or stat operator) is
    used, saving a system call."
    (perldoc -f -s)


    --
    These are my personal views and not those of Fujitsu Siemens Computers!
    Josef Möllers (Pinguinpfleger bei FSC)
    If failure had no penalty success would not be a prize (T. Pratchett)
    Company Details: http://www.fujitsu-siemens.com/imprint.html
     
    Josef Moellers, Nov 6, 2007
    #4
  5. Luca Villa

    Luca Villa Guest

    Thanks to all and to John in particular.

    John's solution perhaps worked but I had difficulty in adapting it for
    my needs so I ended using this alternative solution:


    use File::Find;

    find(\&found, 'c:/dir');


    sub found {
    unless(open(IN,"<$File::Find::name")) {
    warn "Could not open $File::Find::name: $! (SKIPPING)\n";
    return;
    }
    local $/;
    my $data=<IN>;
    close(IN);

    my($type, $number);
    if($data =~ /abc([0-9]+)def/) {
    $number=$1;
    $type=1;
    }
    elsif($data =~ /lmn([0-9]+)opq/) {
    $number=$1;
    $type=2;
    }
    elsif($data =~ /rst([0-9]+)uvw/) {
    $number=$1;
    $type=3;
    }
    else {
    warn "File $File::Find::name is unknown type\n";
    return;
    }

    my $outfn="c:/output/$number-type$type.txt";
    if(-e $outfn) {
    warn "File $outfn already exists.\n";
    return;
    }
    unless(open(OUT,">$outfn")) {
    warn "Could not open $outfn: $!\n";
    return;
    }
    print OUT $data;
    close(OUT);
    }
     
    Luca Villa, Nov 9, 2007
    #5
  6. Luca Villa <> wrote:


    > unless(open(IN,"<$File::Find::name")) {
    > warn "Could not open $File::Find::name: $! (SKIPPING)\n";
    > return;
    > }
    > local $/;
    > my $data=<IN>;
    > close(IN);



    If you are going to mess with the special variables anyway,
    then you could replace all of that with:

    local @ARGV = $_;
    local $/;
    my $data = <>;


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad McClellan, Nov 10, 2007
    #6
  7. Luca Villa

    Luca Villa Guest

    > If you are going to mess with the special variables anyway,
    > then you could replace all of that with:
    >
    > local @ARGV = $_;
    > local $/;
    > my $data = <>;


    I received this error:
    "Can't do inplace edit: . is not a regular file at c:\script.src line
    12."

    inplace edit? What does it want to do?
     
    Luca Villa, Nov 10, 2007
    #7
  8. Luca Villa <> wrote:
    >> If you are going to mess with the special variables anyway,
    >> then you could replace all of that with:
    >>
    >> local @ARGV = $_;
    >> local $/;
    >> my $data = <>;

    >
    > I received this error:
    > "Can't do inplace edit: . is not a regular file at c:\script.src line
    > 12."



    The error message has nothing to do with the code you quoted above.


    > inplace edit? What does it want to do?



    It wants to edit the file "inplace", that is, with the same name.

    You have turned on inplace editing either with the -i command line
    switch, or by setting the $^I variable somewhere...

    Also, what it is trying to edit is not a file, it is a directory. You
    may want to test what find() is operating on with the -d or -f filetest.


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad McClellan, Nov 10, 2007
    #8
  9. Luca Villa

    Luca Villa Guest

    Hi Tad,

    I'm not using any argument a part of the "source.src" that contains
    the script.

    I started to get the error since I used your suggested substitutive
    block.

    This is the source.src exact content, that gives the mentioned error:

    use File::Find;

    find(\&found, 'c:/tempebay/1');

    sub found {
    local @ARGV = $_;
    local $/;
    my $data = <>;


    my($type, $number);
    if($data =~ /<td align="right" nowrap>\s+Item number:\s+(\d+)<\/
    td>/) {
    $number=$1;
    $type="item_description_html";
    }
    elsif($data =~ /Item number:\s*<img src="http:\/\/pics
    \.ebaystatic\.com\/aw\/pics\/s\.gif" width="\d+">(\d+)<\/div>/) {
    $number=$1;
    $type="buyers_history_html";
    }
    else {
    warn "File $File::Find::name is of not interesting type,
    for example an eBay page of item\n";
    return;
    }

    my $outfn="c:/tempebay/2/$number-$type.htm";
    if(-e $outfn) {
    warn "File $outfn already exists.\n";
    return;
    }
    unless(open(OUT,">$outfn")) {
    warn "Could not open $outfn: $!\n";
    return;
    }
    print OUT $data;
    close(OUT);
    }


    ___

    I launch: perl script.src
    and despite that initial error message it actually works!

    Can you understand why does it want to do that inplace edit?
     
    Luca Villa, Nov 10, 2007
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Xah Lee
    Replies:
    1
    Views:
    948
    Ilias Lazaridis
    Sep 22, 2006
  2. Replies:
    3
    Views:
    775
    Reedick, Andrew
    Jul 1, 2008
  3. Mark Tarver
    Replies:
    22
    Views:
    1,320
    J Kenneth King
    Apr 26, 2009
  4. Peter
    Replies:
    34
    Views:
    1,944
    James Kanze
    Oct 17, 2009
  5. Replies:
    2
    Views:
    398
Loading...

Share This Page