Rename File Using Strring Found in File?

Discussion in 'Perl Misc' started by He Who Greets With Fire, Mar 4, 2008.

  1. I am trying to write a little script to access many files in folder,
    parse each file and then if a certain string is found, rename the file
    using a substring of that found string.

    OK, I have posted here before many many years ago (around 2001), back
    when I did some perl programming. I even wrote a program as my senior
    project to parse financial news stories and assign values to the
    stories based on whether there were negative or positive words in the
    news stories.

    Some people here helped me with that program, and when I finished that
    project I posted the code to the web.

    Now I need some more help. :)

    I have not programmed in a long time. I know perl has changed
    somewhat. I have downloaded the latest activestate win32 perl and
    installed it.

    I have a file directory named E:/personalinjury. In the file directory
    are 821 files named from 1.htm to 821.htm

    I want to access each file in turn, and use a regex to parse the file
    contents to see if a string similar to this one is found in it:
    Citation: 20-333 Dorsaneo, Texas Litigation Guide § 333.103
    Some files will not have a string similar to the above string. I am
    not interesting in renaming those files.

    If the string above is found, the numbers 20-333 and 333.103 will be
    the ones that vary from file to file. All the words in the string
    above and the section symbol will remain the same from file to file.
    So another string I might find might be:
    20-332 Dorsaneo, Texas Litigation Guide § 332.107

    I am interested in that string of numbers at the end; in the examples
    above, it is 333.103 and 332.107, but there are many other variations.

    So, I want to rename that file to 333.103 from whatever it was before
    (e.g., so I would rename the file from 1.htm or 5.htm or 200.htm etc
    to 333.103.htm or 333.105.htm or 332.203.htm or whatever).

    So, my script should strip off that string of digits and the end,
    including the decimal point, and rename the file using that string of
    digits.
    Anyone got any ideas?

    thx
     
    He Who Greets With Fire, Mar 4, 2008
    #1
    1. Advertising

  2. On Tue, 04 Mar 2008 04:01:28 GMT, He Who Greets With Fire
    <> wrote:


    >I have a file directory named E:/personalinjury.


    Oops! it should be a BACKslash: E:\personalinjury
     
    He Who Greets With Fire, Mar 4, 2008
    #2
    1. Advertising

  3. He Who Greets With Fire <> wrote:
    > On Tue, 04 Mar 2008 04:01:28 GMT, He Who Greets With Fire
    ><> wrote:
    >
    >
    >>I have a file directory named E:/personalinjury.

    >
    > Oops! it should be a BACKslash: E:\personalinjury



    Slashes that lean either way will work fine if there is no
    "shell" involved.


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Mar 4, 2008
    #3
  4. He Who Greets With Fire <> wrote:

    > I have a file directory named E:/personalinjury. In the file directory
    > are 821 files named from 1.htm to 821.htm
    >
    > I want to access each file in turn,



    foreach my $file ( glob 'E:/personalinjury/*.htm' ) { # untested


    > and use a regex to parse the file
    > contents to see if a string similar to this one is found in it:
    > Citation: 20-333 Dorsaneo, Texas Litigation Guide § 333.103


    open my $PI, '<', $file or die "could not open '$file' $!";
    while ( <$PI> ) {
    next unless /Citation: [\d-]+.*([\d.]+)/;
    my $newfile = $1;


    > So, I want to rename that file to 333.103 from whatever it was before



    rename $file, "$newfile.htm" or die "could not mv '$file' $!";
    last;
    }
    close $PI;


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Mar 4, 2008
    #4
  5. OK, thanks, but the script does not seem to rename the files.
    I added some troubleshooting code, most of which I commented out. I
    also moved a copy of the personalinjury folder and all its files
    inside the C:\Perl directory so it can access it directly.


    See below for my additional comments.


    #!/bin/perl


    #sleep 2;
    print "here I am! \n";
    #sleep 2;
    my $counter =1;

    foreach my $file ( glob 'personalinjury/*.htm' ) {

    # print "here I am A \n";
    # sleep 1;

    open my $PI, '<', $file or die "could not open '$file' $!";

    # print "here I am! B \n";
    # sleep 1;

    print $counter;
    print "\n";
    while ( <$PI> ) {
    # print "\n inside whileloop";

    I AM getting to this point here.

    next unless /Citation: [\d-]+.*([\d.]+)/;

    but I never get to this point here--apparently the regex never sees a
    match for the "Citation:" etc string.

    Here is a screen shot of the typical file, with a red arrow pointing
    to the string in this particular file that I want to match.
    I do not know why the regex does not see a match, because it looks
    like it matches it???

    See here:
    http://img225.imageshack.us/img225/91/citationue2.jpg

    my $newfile = $1;
    rename $file, "$newfile.htm" or die "could not mv '$file' $!";
    print "\n renamed a file";
    sleep 1;
    last;
    }#end while

    $counter++;
    print "\n count is ";
    print $counter;
    print "\n";
    #sleep 1;

    close $PI;
    } #end foreach



    I think the script would work ok except that it never sees a match for
    the regex pattern inside the file. I am seeing the script go through
    each substring of all 821 files, but it never sees a match.
     
    He Who Greets With Fire, Mar 4, 2008
    #5
  6. On Tue, 04 Mar 2008 15:21:19 GMT, He Who Greets With Fire
    <> wrote:

    > next unless /Citation: [\d-]+.*([\d.]+)/;


    I think it has to be something to do with the colon or the white
    spaces between the colon and the first of the digits. Is the colon a
    special character in perl? One white space is in the regex, but there
    appears to be two white spaces in the screen shot of the file I linked
    to above....
     
    He Who Greets With Fire, Mar 4, 2008
    #6
  7. He Who Greets With Fire wrote:
    > On Tue, 04 Mar 2008 15:21:19 GMT, He Who Greets With Fire
    > <> wrote:
    >
    >> next unless /Citation: [\d-]+.*([\d.]+)/;

    >
    > I think it has to be something to do with the colon or the white
    > spaces between the colon and the first of the digits. Is the colon a
    > special character in perl? One white space is in the regex, but there
    > appears to be two white spaces in the screen shot of the file I linked
    > to above....


    I usually replace any white space to be matched by "\s+". That catches
    TABs *and* blanks, so maybe
    next unless /Citation:\s+[\d-]+.*([\d.]+)/;
    will do?
    --
    These are my personal views and not those of Fujitsu Siemens Computers!
    Josef Möllers (Pinguinpfleger bei FSC)
    If failure had no penalty success would not be a prize (T. Pratchett)
    Company Details: http://www.fujitsu-siemens.com/imprint.html
     
    Josef Moellers, Mar 4, 2008
    #7
  8. He Who Greets With Fire

    Ben Morrow Guest

    Quoth He Who Greets With Fire <>:
    >
    > OK, thanks, but the script does not seem to rename the files.
    > I added some troubleshooting code, most of which I commented out. I
    > also moved a copy of the personalinjury folder and all its files
    > inside the C:\Perl directory so it can access it directly.


    Don't do that. You can set the working directory from within your Perl
    script using the chdir function. In any case, the working directory may
    not be what you expect under Win32.

    > See below for my additional comments.
    >
    > #!/bin/perl


    Perl is *never* installed as /bin/perl.

    > #sleep 2;
    > print "here I am! \n";


    Diagnostics like this are better given with warn, which will .a. print
    them to STDERR, where they ought to be and .b. tell you where you are in
    the script.

    > #sleep 2;
    > my $counter =1;
    >
    > foreach my $file ( glob 'personalinjury/*.htm' ) {
    >
    > # print "here I am A \n";
    > # sleep 1;
    >
    > open my $PI, '<', $file or die "could not open '$file' $!";
    >
    > # print "here I am! B \n";
    > # sleep 1;
    >
    > print $counter;
    > print "\n";
    > while ( <$PI> ) {
    > # print "\n inside whileloop";
    >
    > I AM getting to this point here.
    >
    > next unless /Citation: [\d-]+.*([\d.]+)/;
    >
    > but I never get to this point here--apparently the regex never sees a
    > match for the "Citation:" etc string.
    >
    > Here is a screen shot of the typical file, with a red arrow pointing
    > to the string in this particular file that I want to match.
    > I do not know why the regex does not see a match, because it looks
    > like it matches it???
    >
    > See here:
    > http://img225.imageshack.us/img225/91/citationue2.jpg


    *DON'T* do that. Had you done the right thing, and copy-pasted a small
    section of the relevant file into your message, you would have found
    that the file doesn't in fact contain the string 'Citation: whatever' at
    all. It's an HTML file, so there is markup in there as well, and the
    string may well be spread across several lines. Get into the habit of
    looking at files in a text editor before you try parsing them with Perl.

    > my $newfile = $1;
    > rename $file, "$newfile.htm" or die "could not mv '$file' $!";
    > print "\n renamed a file";
    > sleep 1;
    > last;
    > }#end while


    If you had used proper indentation, you would be able to see that
    comments like this are completely useless.

    Ben
     
    Ben Morrow, Mar 4, 2008
    #8
  9. On Tue, 4 Mar 2008 16:15:20 +0000, Ben Morrow <>
    wrote:

    >
    >Quoth He Who Greets With Fire <>:
    >>
    >> OK, thanks, but the script does not seem to rename the files.
    >> I added some troubleshooting code, most of which I commented out. I
    >> also moved a copy of the personalinjury folder and all its files
    >> inside the C:\Perl directory so it can access it directly.

    >
    >Don't do that. You can set the working directory from within your Perl
    >script using the chdir function. In any case, the working directory may
    >not be what you expect under Win32.
    >


    the directory/folder location is not a problem. Like I said, the
    script is indeed able to access the files in the folder and open them
    and increment through them. So, everything seems to be OK on that
    front.



    >> See below for my additional comments.
    >>
    >> #!/bin/perl

    >
    >Perl is *never* installed as /bin/perl.


    but it already works in that regard--the script executes



    >
    >> #sleep 2;
    >> print "here I am! \n";

    >
    >Diagnostics like this are better given with warn, which will .a. print
    >them to STDERR, where they ought to be and .b. tell you where you are in
    >the script.


    Well, I'm not actually a programmer, just someone trying to do some
    organization of my files. So, that issue is not a concern right now.



    >
    >> #sleep 2;
    >> my $counter =1;
    >>
    >> foreach my $file ( glob 'personalinjury/*.htm' ) {
    >>
    >> # print "here I am A \n";
    >> # sleep 1;
    >>
    >> open my $PI, '<', $file or die "could not open '$file' $!";
    >>
    >> # print "here I am! B \n";
    >> # sleep 1;
    >>
    >> print $counter;
    >> print "\n";
    >> while ( <$PI> ) {
    >> # print "\n inside whileloop";
    >>
    >> I AM getting to this point here.
    >>
    >> next unless /Citation: [\d-]+.*([\d.]+)/;
    >>
    >> but I never get to this point here--apparently the regex never sees a
    >> match for the "Citation:" etc string.
    >>
    >> Here is a screen shot of the typical file, with a red arrow pointing
    >> to the string in this particular file that I want to match.
    >> I do not know why the regex does not see a match, because it looks
    >> like it matches it???
    >>
    >> See here:
    >> http://img225.imageshack.us/img225/91/citationue2.jpg

    >
    >*DON'T* do that.


    Don't do what?

    >Had you done the right thing, and copy-pasted a small
    >section of the relevant file into your message, you would have found
    >that the file doesn't in fact contain the string 'Citation: whatever' at
    >all. It's an HTML file, so there is markup in there as well, and the
    >string may well be spread across several lines. Get into the habit of
    >looking at files in a text editor before you try parsing them with Perl.


    That is a good point. When I wrote my financial news project that
    parsed news stories for negative and positive words, I passed over all
    words that were surrounded by html brackets.
    Here are two excerpts from the source html for a typical file in that
    folder:


    here is the html source snippet that the script is looking for:
    <td class="toolbar" align=right valign=top width="1%"
    nowrap>Citation:&nbsp;&nbsp;</td>
    <td class="toolbar" valign=top width="99%"><b>21-340 Dorsaneo, Texas
    Litigation Guide § 340.02</b></td>


    Yes, you are correct: the HTML code is throwing off the script.

    Here is another snippet that looks much more promising. The TITLE of
    the html page. This is not the instance of "citation....etc" that I
    was looking for, but now that I see it, it looks like a good candidate
    for use as a filename:

    <title>Get a Document - by Citation - 21-340 Dorsaneo, Texas
    Litigation Guide § 340.02</title>

    Are the angle brackets special characters in perl so that they have to
    be backslashed inside the regex?

    I wonder if this regex would work?
    next unless /\<title\>Get a Document - by Citation -
    [\d-]+.*([\d.]+)\<\/title\>/;







    >
    >> my $newfile = $1;
    >> rename $file, "$newfile.htm" or die "could not mv '$file' $!";
    >> print "\n renamed a file";
    >> sleep 1;
    >> last;
    >> }#end while

    >
    >If you had used proper indentation, you would be able to see that
    >comments like this are completely useless.


    Not sure what you mean?
    >
    >Ben
     
    He Who Greets With Fire, Mar 4, 2008
    #9
  10. On Tue, 04 Mar 2008 11:13:44 -0600, He Who Greets With Fire
    <> wrote:

    >Here is another snippet that looks much more promising. The TITLE of
    >the html page. This is not the instance of "citation....etc" that I
    >was looking for, but now that I see it, it looks like a good candidate
    >for use as a filename:
    >
    ><title>Get a Document - by Citation - 21-340 Dorsaneo, Texas
    >Litigation Guide § 340.02</title>
    >
    >Are the angle brackets special characters in perl so that they have to
    >be backslashed inside the regex?
    >
    >I wonder if this regex would work?
    >next unless /\<title\>Get a Document - by Citation -
    >[\d-]+.*([\d.]+)\<\/title\>/;




    well, I modified it by adding backslashes in front of the dashes like
    so:
    next unless /\<title\>Get a Document \- by Citation \-
    [\d-]+.*([\d.]+)\<\/title\>/;

    But it still does not work. Again, it does seem to cycle through all
    the files, but nothing matches.
     
    He Who Greets With Fire, Mar 4, 2008
    #10
  11. On Tue, 04 Mar 2008 11:13:44 -0600, He Who Greets With Fire
    <> wrote:

    >On Tue, 4 Mar 2008 16:15:20 +0000, Ben Morrow <>
    >wrote:
    >
    >>
    >>Quoth He Who Greets With Fire <>:
    >>>
    >>> OK, thanks, but the script does not seem to rename the files.
    >>> I added some troubleshooting code, most of which I commented out. I
    >>> also moved a copy of the personalinjury folder and all its files
    >>> inside the C:\Perl directory so it can access it directly.

    >>
    >>Don't do that. You can set the working directory from within your Perl
    >>script using the chdir function. In any case, the working directory may
    >>not be what you expect under Win32.
    >>

    >
    >the directory/folder location is not a problem. Like I said, the
    >script is indeed able to access the files in the folder and open them
    >and increment through them. So, everything seems to be OK on that
    >front.
    >
    >
    >
    >>> See below for my additional comments.
    >>>
    >>> #!/bin/perl

    >>
    >>Perl is *never* installed as /bin/perl.

    >
    >but it already works in that regard--the script executes
    >
    >
    >
    >>
    >>> #sleep 2;
    >>> print "here I am! \n";

    >>
    >>Diagnostics like this are better given with warn, which will .a. print
    >>them to STDERR, where they ought to be and .b. tell you where you are in
    >>the script.

    >
    >Well, I'm not actually a programmer, just someone trying to do some
    >organization of my files. So, that issue is not a concern right now.
    >
    >
    >
    >>
    >>> #sleep 2;
    >>> my $counter =1;
    >>>
    >>> foreach my $file ( glob 'personalinjury/*.htm' ) {
    >>>
    >>> # print "here I am A \n";
    >>> # sleep 1;
    >>>
    >>> open my $PI, '<', $file or die "could not open '$file' $!";
    >>>
    >>> # print "here I am! B \n";
    >>> # sleep 1;
    >>>
    >>> print $counter;
    >>> print "\n";
    >>> while ( <$PI> ) {
    >>> # print "\n inside whileloop";
    >>>
    >>> I AM getting to this point here.
    >>>
    >>> next unless /Citation: [\d-]+.*([\d.]+)/;
    >>>
    >>> but I never get to this point here--apparently the regex never sees a
    >>> match for the "Citation:" etc string.
    >>>
    >>> Here is a screen shot of the typical file, with a red arrow pointing
    >>> to the string in this particular file that I want to match.
    >>> I do not know why the regex does not see a match, because it looks
    >>> like it matches it???
    >>>
    >>> See here:
    >>> http://img225.imageshack.us/img225/91/citationue2.jpg

    >>
    >>*DON'T* do that.

    >
    >Don't do what?
    >
    >>Had you done the right thing, and copy-pasted a small
    >>section of the relevant file into your message, you would have found
    >>that the file doesn't in fact contain the string 'Citation: whatever' at
    >>all. It's an HTML file, so there is markup in there as well, and the
    >>string may well be spread across several lines. Get into the habit of
    >>looking at files in a text editor before you try parsing them with Perl.

    >
    >That is a good point. When I wrote my financial news project that
    >parsed news stories for negative and positive words, I passed over all
    >words that were surrounded by html brackets.
    >Here are two excerpts from the source html for a typical file in that
    >folder:
    >
    >
    >here is the html source snippet that the script is looking for:
    ><td class="toolbar" align=right valign=top width="1%"
    >nowrap>Citation:&nbsp;&nbsp;</td>
    ><td class="toolbar" valign=top width="99%"><b>21-340 Dorsaneo, Texas
    >Litigation Guide § 340.02</b></td>
    >
    >
    >Yes, you are correct: the HTML code is throwing off the script.
    >
    >Here is another snippet that looks much more promising. The TITLE of
    >the html page. This is not the instance of "citation....etc" that I
    >was looking for, but now that I see it, it looks like a good candidate
    >for use as a filename:
    >
    ><title>Get a Document - by Citation - 21-340 Dorsaneo, Texas
    >Litigation Guide § 340.02</title>
    >
    >Are the angle brackets special characters in perl so that they have to
    >be backslashed inside the regex?
    >
    >I wonder if this regex would work?
    >next unless /\<title\>Get a Document - by Citation -
    >[\d-]+.*([\d.]+)\<\/title\>/;
    >
    >
    >
    >
    >
    >
    >
    >>
    >>> my $newfile = $1;
    >>> rename $file, "$newfile.htm" or die "could not mv '$file' $!";
    >>> print "\n renamed a file";
    >>> sleep 1;
    >>> last;
    >>> }#end while

    >>
    >>If you had used proper indentation, you would be able to see that
    >>comments like this are completely useless.

    >
    >Not sure what you mean?
    >>
    >>Ben




    well, I changed the program quite a bit so as to be able to target a
    match with the title string shown above. And I am able to find the
    title string and extract the needed numbers, and I have been able to
    place those numbers in a string variable.

    BUT the problem is that the program crashes whenever I try to rename
    the file using the string that I extracted.

    Here is the program.



    #!/bin/perl


    #sleep 2;
    print "here I am! \n";
    sleep 2;
    my $counter =1;
    foreach my $file ( glob 'personalinjury/*.htm' ) {
    # print "here I am A \n";
    sleep 1;

    open my $PI, '<', $file or die "could not open '$file' $!";
    # print "here I am! B \n";
    # sleep 1;

    print $counter;
    print "\n";
    while ( <$PI> ) {
    print "\n inside whileloop";
    sleep 1;
    #<title>Get a Document - by Citation - 21-340 Dorsaneo, Texas
    #Litigation Guide § 340.02</title>

    warn;
    next unless /\<title\>.+Guide\s+§\s+(\d+\.\d+).?\<\/title\>/;
    my $newfile = $+;






    this rename line below is what causes it to crash, so i commented it
    out:
    #rename $file, "$newfile.htm" or die "could not mv '$file' $!";

    But I cannot read what the error message says because the dos window
    just closes. Where can I read what was in the window before it
    crashed? And how can I rename the file? What went wrong with the
    renaming?

    The $newfile variable DOES contain the accurate and desired
    information at this point, as shown by the print statement below.
    print "\n renamed file to ";
    print $newfile, "\n";
    sleep 1;
    last;
    }#end while

    $counter++;
    print "\n count is ";
    print $counter;
    print "\n";
    #sleep 1;

    close $PI;
    } #end foreach
    sleep 5;
     
    He Who Greets With Fire, Mar 4, 2008
    #11
  12. He Who Greets With Fire <> wrote:
    > On Tue, 4 Mar 2008 16:15:20 +0000, Ben Morrow <>
    > wrote:
    >>Quoth He Who Greets With Fire <>:
    >>>
    >>> OK, thanks, but the script does not seem to rename the files.

    ^^^^^^^
    ^^^^^^^
    Eh?

    >>> I added some troubleshooting code, most of which I commented out. I
    >>> also moved a copy of the personalinjury folder and all its files
    >>> inside the C:\Perl directory so it can access it directly.

    >>
    >>Don't do that. You can set the working directory from within your Perl
    >>script using the chdir function. In any case, the working directory may
    >>not be what you expect under Win32.
    >>

    >
    > the directory/folder location is not a problem.



    Exactly so. That is why you should not do that.

    Your "current working directory" and the directory that your perl
    executable are in are not the same thing.

    The directory where your perl binary lives has no connection
    whatsoever to accessing files so moving files under there
    will not solve file accessing problems.

    Your cwd is what matters with regard to filesystem access.


    >>> #!/bin/perl

    >>
    >>Perl is *never* installed as /bin/perl.

    >
    > but it already works in that regard--the script executes



    The fact that your script executes does not prove that perl
    is installed as /bin/perl.

    (Windows programs use some other mechanizm for associating files).

    You should either use a place where perl is usually installed, eg:

    #!/usr/bin/perl

    or simply

    #!perl

    if you want to use command line switches, or even

    (nothing)

    leave that line out completely.


    >>> #sleep 2;



    Why do you think that calling sleep() will help with debugging?


    >>> print "here I am! \n";

    >>
    >>Diagnostics like this are better given with warn, which will .a. print
    >>them to STDERR, where they ought to be and .b. tell you where you are in
    >>the script.

    >
    > Well, I'm not actually a programmer,



    You will need to become a bit of a programmer if you hope to
    write a bit of a program.


    > just someone trying to do some
    > organization of my files.



    If you need some programming done, and you want to do it yourself,
    then you are going to have to learn some programming.


    > So, that issue is not a concern right now.



    so the issue of how to debug programs should be of ultimate concern
    right now, since now is when you have a program that you need to debug!

    Error and warning messages should go on STDERR, not STDOUT.


    >>> foreach my $file ( glob 'personalinjury/*.htm' ) {
    >>>
    >>> # print "here I am A \n";



    The text of the debugging message should tell you where it is in
    the program rather than requiring you to search in the program
    to find where it is. It also give you a chance to examine the
    data that you are operating on:

    warn "processing '$file' inside the foreach loop\n";


    >>> open my $PI, '<', $file or die "could not open '$file' $!";
    >>>
    >>> # print "here I am! B \n";



    warn "succeeded in opening $file\n";


    >>> while ( <$PI> ) {
    >>> # print "\n inside whileloop";



    warn "processing '$_' inside the while loop\n";


    >>> I AM getting to this point here.
    >>>
    >>> next unless /Citation: [\d-]+.*([\d.]+)/;
    >>>
    >>> but I never get to this point here--apparently the regex never sees a
    >>> match for the "Citation:" etc string.



    Then you should modify the regex so that is sees a match for
    the "Citation:" etc string.

    To do that, you need to know *exactly* what the data looks like,
    and you probably need to read some of the standard documentation
    that covers regexes.


    >>> Here is a screen shot of the typical file, with a red arrow pointing
    >>> to the string in this particular file that I want to match.
    >>> I do not know why the regex does not see a match, because it looks
    >>> like it matches it???
    >>>
    >>> See here:
    >>> http://img225.imageshack.us/img225/91/citationue2.jpg

    >>
    >>*DON'T* do that.

    >
    > Don't do what?



    Don't post a screenshot. Your program is processing text, not graphics.

    Don't post a URL and expect people to go follow it to find out what
    you are talking about.


    >>Had you done the right thing, and copy-pasted a small
    >>section of the relevant file into your message,



    Do post a copy-pasted section of the data into your message.


    >> you would have found
    >>that the file doesn't in fact contain the string 'Citation: whatever' at
    >>all. It's an HTML file, so there is markup in there as well, and the
    >>string may well be spread across several lines. Get into the habit of
    >>looking at files in a text editor before you try parsing them with Perl.

    >
    > That is a good point.



    Well duh.

    When crafting a regular expression, it is *essential* to know *exactly*
    what the data you are trying to match looks like.


    > Here are two excerpts from the source html for a typical file in that
    > folder:
    >
    >
    ><td class="toolbar" align=right valign=top width="1%"
    > nowrap>Citation:&nbsp;&nbsp;</td>
    ><td class="toolbar" valign=top width="99%"><b>21-340 Dorsaneo, Texas
    > Litigation Guide § 340.02</b></td>



    If the data you want is in an HTML table, then you should use
    a module that will process an HTML table for you, such
    as HTML::TableExtract.


    > Are the angle brackets special characters in perl so that they have to
    > be backslashed inside the regex?



    Yes, angle brackets are special characters in Perl, they mean
    "less than" and "greater than" and whatnot.

    No, angle brackets are not special characters in a regular
    expression, so they do not need to be backslashed.

    The Perl Language and the Regular Expression Language are different
    languages, so the funny characters mean different things depending
    on which language you are in.


    > I wonder if this regex would work?



    The way to answer that is to write a teeny tiny program
    and *see* for yourself it it works or not.


    >>> my $newfile = $1;
    >>> rename $file, "$newfile.htm" or die "could not mv '$file' $!";
    >>> print "\n renamed a file";
    >>> sleep 1;
    >>> last;
    >>> }#end while

    >>
    >>If you had used proper indentation, you would be able to see that
    >>comments like this are completely useless.

    >
    > Not sure what you mean?



    Me either.


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Mar 5, 2008
    #12
  13. He Who Greets With Fire

    Ben Morrow Guest

    [please trim your quotations]

    Quoth He Who Greets With Fire <>:
    > On Tue, 04 Mar 2008 11:13:44 -0600, He Who Greets With Fire
    > <> wrote:
    > >On Tue, 4 Mar 2008 16:15:20 +0000, Ben Morrow <>
    > >wrote:
    > >
    > >>Diagnostics like this are better given with warn, which will .a. print
    > >>them to STDERR, where they ought to be and .b. tell you where you are in
    > >>the script.

    > >
    > >Well, I'm not actually a programmer, just someone trying to do some
    > >organization of my files. So, that issue is not a concern right now.


    The fact you don't consider yourself a programmer is irrelevant. You are
    writing a program, and it will make your life easier if you do it
    properly.

    > >>*DON'T* do that.

    > >
    > >Don't do what?


    Take screenshots of HTML files, rather than posting a sample.

    <snip>
    > >Here is another snippet that looks much more promising. The TITLE of
    > >the html page. This is not the instance of "citation....etc" that I
    > >was looking for, but now that I see it, it looks like a good candidate
    > >for use as a filename:
    > >
    > ><title>Get a Document - by Citation - 21-340 Dorsaneo, Texas
    > >Litigation Guide § 340.02</title>


    Is this spread across two lines in the HTML file? If so, then reading
    the file line-by-line with while (<>) will never give you a string that
    matches your regex. You would be better off reading the whole file with
    File::Slurp.

    > >Are the angle brackets special characters in perl so that they have to
    > >be backslashed inside the regex?


    No. See perldoc perlreref: it lists all the special characters.

    <snip>
    > >>If you had used proper indentation, you would be able to see that
    > >>comments like this are completely useless.

    > >
    > >Not sure what you mean?


    If you write your code like this

    while (<>) {
    #lots of code
    #lots of code
    #lots of code
    #lots of code
    }

    then there is no need for any '#end while' comments: you can see it's
    the end of the while from the indentation. Any half-decent editor will
    find matching braces for you, as well. The comment just becomes noise
    that obscures the important bits of the code.

    > well, I changed the program quite a bit so as to be able to target a
    > match with the title string shown above. And I am able to find the
    > title string and extract the needed numbers, and I have been able to
    > place those numbers in a string variable.
    >
    > BUT the problem is that the program crashes whenever I try to rename
    > the file using the string that I extracted.


    I seriously doubt it 'crashes'. That would be a serious bug in perl.
    More likely, the rename fails for some reason and the program exits with
    an error.

    > this rename line below is what causes it to crash, so i commented it
    > out:
    > #rename $file, "$newfile.htm" or die "could not mv '$file' $!";
    >
    > But I cannot read what the error message says because the dos window
    > just closes. Where can I read what was in the window before it
    > crashed?


    Open a cmd window yourself (Start / Run / cmd), cd into the appropriate
    directory and run the script yourself with 'perl script.pl'. Then the
    window won't go away.

    > And how can I rename the file? What went wrong with the
    > renaming?


    Noone can tell that from here until you can see what the error message
    said.

    Ben
     
    Ben Morrow, Mar 5, 2008
    #13
  14. On Wed, 05 Mar 2008 02:45:34 +0000, Ben Morrow wrote:

    >> >>If you had used proper indentation, you would be able to see that
    >> >>comments like this are completely useless.
    >> >
    >> >Not sure what you mean?

    >
    > If you write your code like this
    >
    > while (<>) {
    > #lots of code
    > #lots of code
    > #lots of code
    > #lots of code
    > }
    >
    > then there is no need for any '#end while' comments: you can see it's
    > the end of the while from the indentation. Any half-decent editor will
    > find matching braces for you, as well. The comment just becomes noise
    > that obscures the important bits of the code.


    Also, if it really is "lots of code" you should put that code in subs.
    The while loop instantly becomes much more readable:

    while (<>) {
    do_this();
    do_that(param, param);
    if (check_something(param)) {
    log_error();
    last;
    }
    remainder_of_processing();
    }

    HTH,
    M4
     
    Martijn Lievaart, Mar 5, 2008
    #14
  15. He Who Greets With Fire

    ccc31807 Guest

    On Mar 3, 11:01 pm, He Who Greets With Fire
    <> wrote:
    > I am trying to write a little script to access many files in folder,
    > parse each file and then if a certain string is found, rename the file
    > using a substring of that found string.


    It's important that you follow a methodology that self-corrects itself
    each step of the way. If you could post a sample of a file that you
    want to look at, it would be easier to see what you want to do. Also,
    the format of the file is important. I assume that you want to reat
    ASCII files.

    The first step would be as follows. I would recommend a very small
    subset of files in the beginning, one having the string you want and
    one not.

    1. begin your file examination loop that iterates through all files
    2. open each file (in turn)
    3. print each line
    4. close each file (in turn)
    5. end the loop.

    When you run this, you can redirect the output to a text file for your
    convenience. This will show you EXACTLY what Perl sees and will match
    to your regular expression. It will also form the logic for your
    program. Once you get this working to your satisfaction, you can start
    to match your regular expression.

    > I have a file directory named E:/personalinjury. In the file directory
    > are 821 files named from 1.htm to 821.htm


    You want to run your script from this directory.

    > I want to access each file in turn, and use a regex to parse the file
    > contents to see if a string similar to this one is found in it:
    > Citation: 20-333 Dorsaneo, Texas Litigation Guide § 333.103
    > Some files will not have a string similar to the above string. I am
    > not interesting in renaming those files.


    What I would do (for starters, anyway) is this. Create a $counter.
    Search each line for the string '333.nnn '. That is, a literal of two
    3s followed by a digit followed by a literal period followed by three
    digits and a space. If that string is found, rename the file like
    this: 'TLG__33n_nnn_${counter}.txt' Obviously, if this string can be
    found over multiple lines, you will have to fine tune your regex, but
    I doubt that you will ever have a line break dividing a section
    number, and I also doubt that you will have many false positives.

    > Anyone got any ideas?


    Yeah, do it a piece at a time and make sure the prior part works
    perfectly before you take the next step. You don't want to do this
    project is one fell swoop unless you have plenty of practice.

    Also, post a piece of your source file so we can see what it looks
    like.

    CC
     
    ccc31807, Mar 5, 2008
    #15
  16. On Tue, 04 Mar 2008 12:10:16 GMT, Tad J McClellan
    <> wrote:

    >He Who Greets With Fire <> wrote:
    >
    >> I have a file directory named E:/personalinjury. In the file directory
    >> are 821 files named from 1.htm to 821.htm
    >>
    >> I want to access each file in turn,

    >
    >
    > foreach my $file ( glob 'E:/personalinjury/*.htm' ) { # untested
    >
    >
    >> and use a regex to parse the file
    >> contents to see if a string similar to this one is found in it:
    >> Citation: 20-333 Dorsaneo, Texas Litigation Guide § 333.103

    >
    > open my $PI, '<', $file or die "could not open '$file' $!";
    > while ( <$PI> ) {
    > next unless /Citation: [\d-]+.*([\d.]+)/;
    > my $newfile = $1;
    >
    >
    >> So, I want to rename that file to 333.103 from whatever it was before

    >
    >
    > rename $file, "$newfile.htm" or die "could not mv '$file' $!";
    > last;
    > }
    > close $PI;



    I have solved all the problems and have created a working script to
    accomplish the task I needed to do . THere are however some problems
    with the code you posted above here. For one, the rename() function
    takes string values as arguments, not file handles. A file handle is a
    pointer, and as such, its value is a numerical value representing an
    address in RAM memory, not a string value. The $file variable you used
    above as the first argument to rename() is a file handle, not a string
    value. Second, you cannot rename a file that is presently open for
    reading. Above, you closed the file later after you tried to rename
    it. You should have closed it before you tried to rename it.




    --He Who Greets With Fire
     
    He Who Greets With Fire, Mar 6, 2008
    #16
  17. He Who Greets With Fire wrote:
    > On Tue, 04 Mar 2008 12:10:16 GMT, Tad J McClellan
    > <> wrote:
    >
    >> He Who Greets With Fire <> wrote:
    >>
    >>> I have a file directory named E:/personalinjury. In the file directory
    >>> are 821 files named from 1.htm to 821.htm
    >>>
    >>> I want to access each file in turn,

    >>
    >> foreach my $file ( glob 'E:/personalinjury/*.htm' ) { # untested
    >>
    >>
    >>> and use a regex to parse the file
    >>> contents to see if a string similar to this one is found in it:
    >>> Citation: 20-333 Dorsaneo, Texas Litigation Guide § 333.103

    >> open my $PI, '<', $file or die "could not open '$file' $!";
    >> while ( <$PI> ) {
    >> next unless /Citation: [\d-]+.*([\d.]+)/;
    >> my $newfile = $1;
    >>
    >>
    >>> So, I want to rename that file to 333.103 from whatever it was before

    >>
    >> rename $file, "$newfile.htm" or die "could not mv '$file' $!";
    >> last;
    >> }
    >> close $PI;

    >
    > I have solved all the problems and have created a working script to
    > accomplish the task I needed to do . THere are however some problems
    > with the code you posted above here. For one, the rename() function
    > takes string values as arguments, not file handles.


    Yes, it renames files using (surprise) the names of files.

    > A file handle is a pointer,


    Not in Perl, Perl doesn't have pointers.

    > and as such, its value is a numerical value representing an
    > address in RAM memory, not a string value. The $file variable you used
    > above as the first argument to rename() is a file handle, not a string
    > value.


    The only filehandle in the code above is $PI. $file is a file name
    obtained from "glob 'E:/personalinjury/*.htm'".

    > Second, you cannot rename a file that is presently open for
    > reading. Above, you closed the file later after you tried to rename
    > it. You should have closed it before you tried to rename it.


    Only on Windows. Other operating systems allow a file to be renamed
    whether or not it is opened.


    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
     
    John W. Krahn, Mar 6, 2008
    #17
  18. He Who Greets With Fire <> wrote:
    > On Tue, 04 Mar 2008 12:10:16 GMT, Tad J McClellan
    ><> wrote:
    >
    >>He Who Greets With Fire <> wrote:
    >>
    >>> I have a file directory named E:/personalinjury. In the file directory
    >>> are 821 files named from 1.htm to 821.htm
    >>>
    >>> I want to access each file in turn,

    >>
    >>
    >> foreach my $file ( glob 'E:/personalinjury/*.htm' ) { # untested
    >>
    >>
    >>> and use a regex to parse the file
    >>> contents to see if a string similar to this one is found in it:
    >>> Citation: 20-333 Dorsaneo, Texas Litigation Guide § 333.103

    >>
    >> open my $PI, '<', $file or die "could not open '$file' $!";
    >> while ( <$PI> ) {
    >> next unless /Citation: [\d-]+.*([\d.]+)/;
    >> my $newfile = $1;
    >>
    >>
    >>> So, I want to rename that file to 333.103 from whatever it was before

    >>
    >>
    >> rename $file, "$newfile.htm" or die "could not mv '$file' $!";
    >> last;
    >> }
    >> close $PI;

    >
    >
    > I have solved all the problems and have created a working script to
    > accomplish the task I needed to do . THere are however some problems
    > with the code you posted above here. For one, the rename() function
    > takes string values as arguments, not file handles.



    Right.


    > A file handle is a
    > pointer,



    Wrong.


    > and as such, its value is a numerical value representing an
    > address in RAM memory, not a string value. The $file variable you used
    > above as the first argument to rename() is a file handle, not a string



    No, $file is a string, not a filehandle.


    > value. Second, you cannot rename a file that is presently open for
    > reading.



    Yes I can.


    > Above, you closed the file later after you tried to rename
    > it.



    Works fine on most sensible filesytems.


    > You should have closed it before you tried to rename it.



    Not necessary on most sensible filesytems.


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Mar 6, 2008
    #18
  19. Tad J McClellan <> wrote:
    > He Who Greets With Fire <> wrote:
    >> On Tue, 04 Mar 2008 12:10:16 GMT, Tad J McClellan
    >><> wrote:
    >>
    >>>He Who Greets With Fire <> wrote:


    >>> foreach my $file ( glob 'E:/personalinjury/*.htm' ) { # untested


    >>> rename $file, "$newfile.htm" or die "could not mv '$file' $!";


    >> THere are however some problems
    >> with the code you posted above here. For one, the rename() function
    >> takes string values as arguments, not file handles.

    >
    >
    > Right.



    Oh. I think I see what happened there.

    A filename glob (perldoc -f glob) is not the same as a "typeglob"
    ("Typeglobs and Filehandles" section in perldoc perldata).


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Mar 7, 2008
    #19
  20. [OT] Re: Rename File Using Strring Found in File?

    On Fri, 07 Mar 2008 14:11:53 +0000, Abigail wrote:

    > _
    > Ben Morrow () wrote on VCCXCIX September MCMXCIII in
    > <URL:news:eek:>: ..
    > ..
    > .. Perl is *never* installed as /bin/perl.
    >
    >
    > Bullocks.
    >
    > Even beside the fact Perl will install itself pretty much everywhere
    > where the person running Configure tells it to (barring existance of the
    > directory and permission), there's a major operating system where /bin
    > and /usr/bin are identical.


    Which reminds of the time I wanted to move /usr. Should be a static
    filesystem, so create new slice, copy contents, mv /usr /usr.old and then
    just ...... boot from CD to fix the mess.

    M4
     
    Martijn Lievaart, Mar 7, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?U2hhbyBZb25n?=

    How to rename a file by using ASP.NET?

    =?Utf-8?B?U2hhbyBZb25n?=, May 9, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    902
    =?Utf-8?B?U2hhbyBZb25n?=
    May 10, 2004
  2. Jason Heyes
    Replies:
    1
    Views:
    6,865
    Jaspreet
    Jun 15, 2005
  3. Replies:
    3
    Views:
    586
  4. =?iso-8859-1?b?cultaQ==?=

    Rename multiple files using names in a text file

    =?iso-8859-1?b?cultaQ==?=, Sep 14, 2007, in forum: Python
    Replies:
    2
    Views:
    483
    =?iso-8859-1?b?cultaQ==?=
    Sep 15, 2007
  5. Mmcolli00 Mom
    Replies:
    1
    Views:
    183
    Reid Thompson
    Dec 12, 2008
Loading...

Share This Page