To extract file name only from a file

Discussion in 'Perl Misc' started by Rider, Jul 9, 2009.

  1. Rider

    Rider Guest

    Hi experts,

    I have this file, inut.txt (listed below). each line in the file has
    more than 10 fields, but I am just listing a sample format here.

    I need to print out only the filenames that are ending with .txt in
    the output..

    The output should be:
    ===============
    unixFile1.txt
    unixFile2.txt
    unixFile3.txt
    ....
    ...
    ===============

    I am looking out for a shorter form of a reg exp to extract only the
    file names in to the output here. I do the basic perl coding on an
    occasional basis, but don't know the right reg exp to do it.

    Thanks in advance,
    J


    Here is the input file.
    ===================
    1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
    2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
    3, bob, usr/tst/unixFile3.txt, boston, text1, text2, text3
    ......
    .....
    ===================
    Rider, Jul 9, 2009
    #1
    1. Advertising

  2. Rider wrote:
    > Hi experts,
    >
    > I have this file, inut.txt (listed below). each line in the file has
    > more than 10 fields, but I am just listing a sample format here.
    >
    > I need to print out only the filenames that are ending with .txt in
    > the output..
    >
    > The output should be:
    > ===============
    > unixFile1.txt
    > unixFile2.txt
    > unixFile3.txt
    > ...
    > ..
    > ===============
    >
    > I am looking out for a shorter form of a reg exp to extract only the
    > file names in to the output here. I do the basic perl coding on an
    > occasional basis, but don't know the right reg exp to do it.
    >
    > Thanks in advance,
    > J
    >
    >
    > Here is the input file.
    > ===================
    > 1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
    > 2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
    > 3, bob, usr/tst/unixFile3.txt, boston, text1, text2, text3


    $f1 = (split(/,\s+/, $line))[2];
    print "$f1\n" if $f1 =~ /\.txt$/;


    --
    These are my personal views and not those of Fujitsu Technology Solutions!
    Josef Möllers (Pinguinpfleger bei FTS)
    If failure had no penalty success would not be a prize (T. Pratchett)
    Company Details: http://de.ts.fujitsu.com/imprint.html
    Josef Moellers, Jul 9, 2009
    #2
    1. Advertising

  3. Rider wrote:
    > Hi experts,
    >
    > I have this file, inut.txt (listed below). each line in the file has
    > more than 10 fields, but I am just listing a sample format here.
    >
    > I need to print out only the filenames that are ending with .txt in
    > the output..
    >
    > The output should be:
    > ===============
    > unixFile1.txt
    > unixFile2.txt
    > unixFile3.txt
    > ...
    > ..
    > ===============
    >
    > I am looking out for a shorter form of a reg exp to extract only the
    > file names in to the output here. I do the basic perl coding on an
    > occasional basis, but don't know the right reg exp to do it.
    >
    > Thanks in advance,
    > J
    >
    >
    > Here is the input file.
    > ===================
    > 1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
    > 2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
    > 3, bob, usr/tst/unixFile3.txt, boston, text1, text2, text3
    > .....
    > ....
    > ===================



    $f1 = (split(/\//, (split(/,\s+/, $line))[2]))[-1];
    print "$f1\n" if $f1 =~ /\.txt$/;


    --
    These are my personal views and not those of Fujitsu Technology Solutions!
    Josef Möllers (Pinguinpfleger bei FTS)
    If failure had no penalty success would not be a prize (T. Pratchett)
    Company Details: http://de.ts.fujitsu.com/imprint.html
    Josef Moellers, Jul 9, 2009
    #3
  4. Rider <> wrote:
    >I have this file, inut.txt (listed below). each line in the file has
    >more than 10 fields, but I am just listing a sample format here.
    >
    >I need to print out only the filenames that are ending with .txt in
    >the output..
    >
    >The output should be:
    >===============
    >unixFile1.txt
    >unixFile2.txt


    This looks like a standard CSV format, and you want the third column. So
    I would use Text::CSV and grab the third element from each row.

    If you insist on reinventing the wheel then at least for the sample data
    you have shown you can grab the third element after split()ing each line
    at the comma.

    jue
    Jürgen Exner, Jul 9, 2009
    #4
  5. Rider

    Rider Guest

    On Jul 9, 8:06 am, Josef Moellers <>
    wrote:
    > Rider wrote:
    > > Hi experts,

    >
    > > I have this file, inut.txt (listed below). each line in the file has
    > > more than 10 fields, but I am just listing a sample format here.

    >
    > > I need to print out only the filenames that are ending with .txt in
    > > the output..

    >
    > > The output should be:
    > > ===============
    > > unixFile1.txt
    > > unixFile2.txt
    > > unixFile3.txt
    > > ...
    > > ..
    > > ===============

    >
    > > I am looking out for a shorter form of a reg exp to extract only the
    > > file names in to the output here. I do the basic perl coding on an
    > > occasional basis, but don't know the right reg exp to do it.

    >
    > > Thanks in advance,
    > > J

    >
    > > Here is the input file.
    > > ===================
    > > 1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
    > > 2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
    > > 3, bob, usr/tst/unixFile3.txt, boston, text1, text2, text3
    > > .....
    > > ....
    > > ===================

    >
    > $f1 = (split(/\//, (split(/,\s+/, $line))[2]))[-1];
    > print "$f1\n" if $f1 =~ /\.txt$/;
    >
    > --
    > These are my personal views and not those of Fujitsu Technology Solutions!
    > Josef Möllers (Pinguinpfleger bei FTS)
    >         If failure had no penalty success would not be a prize (T..  Pratchett)
    > Company Details:http://de.ts.fujitsu.com/imprint.html


    Thanks Josef,

    But I am looking out for a one-liner of just grabbing the only file
    name that ends with .txt from each line with no need of using split
    function. I am sure that that I saw that kind of reg expression
    before, but I can not recall now.
    Rider, Jul 9, 2009
    #5
  6. Rider <> wrote:
    >> Rider wrote:
    >> > I need to print out only the filenames that are ending with .txt in
    >> > the output..

    >>
    >> > The output should be:
    >> > ===============
    >> > unixFile1.txt
    >> > unixFile2.txt
    >> > unixFile3.txt
    >> > ...


    >> > 1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
    >> > 2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
    >> > 3, bob, usr/tst/unixFile3.txt, boston, text1, text2, text3
    >> > .....

    >But I am looking out for a one-liner of just grabbing the only file
    >name that ends with .txt from each line with no need of using split
    >function. I am sure that that I saw that kind of reg expression
    >before, but I can not recall now.


    Unless this is some academic excercise why do you want to do it the hard
    way?
    It is the easy way and the most robust way to use Text::CSV, grab the
    third item, and then use File::Basename to extract the file name.

    Or actually in your case you could also use
    substr($line, 16, 13) #might be off by one somewhere
    because the filename starts at character 16 and is 13 characters long.

    Oh, you mean that's just your sample data and the actual data might vary
    in lenght? Well, to bad, because your actual data may also vary in such
    a way to make a regexp fail. That is exactly why using Text::CSV and
    File::Basename are more robust and spare you from patching your
    hand-rolled code over and over again whenever you encounter some
    unforeseen data.

    jue
    Jürgen Exner, Jul 9, 2009
    #6
  7. Rider

    Rider Guest

    On Jul 9, 9:07 am, Jürgen Exner <> wrote:
    > Rider <> wrote:
    > >> Rider wrote:
    > >> > I need to print out only the filenames that are ending with .txt in
    > >> > the output..

    >
    > >> > The output should be:
    > >> > ===============
    > >> > unixFile1.txt
    > >> > unixFile2.txt
    > >> > unixFile3.txt
    > >> > ...
    > >> > 1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
    > >> > 2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
    > >> > 3, bob, usr/tst/unixFile3.txt, boston, text1, text2, text3
    > >> > .....

    > >But I am looking out for a one-liner of just grabbing the only file
    > >name that ends with .txt from each line with no need of using split
    > >function. I am sure that that I saw that kind of reg expression
    > >before, but I can not recall now.

    >
    > Unless this is some academic excercise why do you want to do it the hard
    > way?
    > It is the easy way and the most robust way to use Text::CSV, grab the
    > third item, and then use File::Basename to extract the file name.
    >
    > Or actually in your case you could also use
    >         substr($line, 16, 13) #might be off by one somewhere
    > because the filename starts at character 16 and is 13 characters long.
    >
    > Oh, you mean that's just your sample data and the actual data might vary
    > in lenght? Well, to bad, because your actual data may also vary in such
    > a way to make a regexp fail. That is exactly why using Text::CSV and
    > File::Basename are more robust and spare you from patching your
    > hand-rolled code over and over again whenever you encounter some
    > unforeseen data.
    >
    > jue


    It is not a CSV file.. it is a PHP file with a lot of comments in the
    middle of the file as well.
    So I am looking out for a reg exp for just gets me only the file name
    that is ending with .txt (this file might have a space in the middle..
    example: user input.txt, instead of userinput.txt).
    Rider, Jul 9, 2009
    #7
  8. Rider

    Rider Guest

    On Jul 9, 9:53 am, l v <> wrote:
    > Rider wrote:
    > > On Jul 9, 8:06 am, Josef Moellers <>
    > > wrote:
    > >> Rider wrote:
    > >>> Hi experts,
    > >>> I have this file, inut.txt (listed below). each line in the file has
    > >>> more than 10 fields, but I am just listing a sample format here.
    > >>> I need to print out only the filenames that are ending with .txt in
    > >>> the output..
    > >>> The output should be:
    > >>> ===============
    > >>> unixFile1.txt
    > >>> unixFile2.txt
    > >>> unixFile3.txt
    > >>> ...
    > >>> ..
    > >>> ===============
    > >>> I am looking out for a shorter form of a reg exp to extract only the
    > >>> file names in to the output here. I do the basic perl coding on an
    > >>> occasional basis, but don't know the right reg exp to do it.
    > >>> Thanks in advance,
    > >>> J
    > >>> Here is the input file.
    > >>> ===================
    > >>> 1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
    > >>> 2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
    > >>> 3, bob, usr/tst/unixFile3.txt, boston, text1, text2, text3
    > >>> .....
    > >>> ....
    > >>> ===================
    > >> $f1 = (split(/\//, (split(/,\s+/, $line))[2]))[-1];
    > >> print "$f1\n" if $f1 =~ /\.txt$/;

    >
    > >> --
    > >> These are my personal views and not those of Fujitsu Technology Solutions!
    > >> Josef Möllers (Pinguinpfleger bei FTS)
    > >>         If failure had no penalty success would not be a prize(T.  Pratchett)
    > >> Company Details:http://de.ts.fujitsu.com/imprint.html

    >
    > > Thanks Josef,

    >
    > > But I am looking out for a one-liner of just grabbing the only file
    > > name that ends with .txt from each line with no need of using split
    > > function. I am sure that that I saw that kind of reg expression
    > > before, but I can not recall now.

    >
    > perl -nle 'print $1 if (/.+\/(.+\.txt)/)' rider.txt
    >
    > where rider.txt is your input file.
    >
    > D:\Perl\source\1>perl -nle "print $1 if (/.+\/(.+\.txt)/)" rider.txt
    > unixFile1.txt
    > unixFile2.txt
    > unixFile3.txt
    >
    > --
    >
    > Len


    Awesome Len..

    Thanks a bunch.. this serves my purpose. Though I did not run, I can
    see that it would work.
    Rider, Jul 9, 2009
    #8
  9. Rider

    Guest

    On Thu, 9 Jul 2009 08:00:03 -0700 (PDT), Rider <> wrote:

    >
    >Hi experts,
    >
    >I have this file, inut.txt (listed below). each line in the file has
    >more than 10 fields, but I am just listing a sample format here.
    >
    >I need to print out only the filenames that are ending with .txt in
    >the output..
    >
    >The output should be:
    >===============
    >unixFile1.txt
    >unixFile2.txt
    >unixFile3.txt
    >...
    >..
    >===============
    >
    >I am looking out for a shorter form of a reg exp to extract only the
    >file names in to the output here. I do the basic perl coding on an
    >occasional basis, but don't know the right reg exp to do it.
    >
    >Thanks in advance,
    >J
    >
    >
    >Here is the input file.
    >===================
    >1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
    >2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
    >3, bob, usr/tst/unixFile3.txt, boston, text1, text2, text3
    >.....
    >....
    >===================


    This might help. Its a construct your own recipe. How you use it
    is up to you. Certainly not a 1-liner (or short) but neither is real
    file name parsing. There might be a module you could invoke.
    Or you could use something like:

    /(?:(\/\s*[.-]+.*?)|([a-z0-9_][a-z0-9_ .-]*\.txt))[\s,]+/i and defined $2

    -sln

    ----------------------------
    ## parse_fname_unix.pl
    ## (some rudimentary regex construction)
    ##
    use strict;
    use warnings;

    use constant debug => 1;

    my $start_char = "a-z0-9_";
    my $body_chars = "$start_char .-";
    my $field_seps = "\\s,";
    my $fname = "[$start_char][$body_chars]*";
    my $ext = "txt";
    my $bad_fname = "\/\\s*[.-]+.*?";

    my $qualified_name = qr/(?:($bad_fname)|($fname\.$ext))[$field_seps]+/i;

    print "\n$qualified_name\n";

    while (<DATA>)
    {
    next if (/^\s*$/);

    if (debug) {
    print "\n$_";
    while (/$qualified_name/g)
    {
    print "\tBAD: $1\n" if defined $1;
    print "\tOK: $2\n" if defined $2;
    }
    } else {
    while (/$qualified_name/g and defined $2) {
    print "$2\n";
    }
    }
    }

    __DATA__

    -4, bob, unix/ .txt/File_-4.txt, boston, text1, unix/tst.txt/File_-4a.txt
    -3, bob, unix .txt/File_-3.txt, boston, text1, text2, text3
    -2, bob, unix .txt/.-File_-2.txt, boston, text1, text2, text3
    -1, bob, unix .txt.-File_-1.txt, boston, text1, text2, text3
    0, bob, unixFile0.txt, boston, text1, text2, text3
    1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
    2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
    3, bob, usr/tst/unix.some.txt.File3.txt, boston, text1, text2, text3
    4, bob, usr/tst.txt/unixFile4.Txt, boston, text1, text2, text3

    --------------------
    output:

    (?i-xsm:(?:(/\s*[.-]+.*?)|([a-z0-9_][a-z0-9_ .-]*\.txt))[\s,]+)

    -4, bob, unix/ .txt/File_-4.txt, boston, text1, unix/tst.txt/File_-4a.txt
    BAD: / .txt/File_-4.txt
    OK: File_-4a.txt

    -3, bob, unix .txt/File_-3.txt, boston, text1, text2, text3
    OK: File_-3.txt

    -2, bob, unix .txt/.-File_-2.txt, boston, text1, text2, text3
    BAD: /.-File_-2.txt

    -1, bob, unix .txt.-File_-1.txt, boston, text1, text2, text3
    OK: unix .txt.-File_-1.txt

    0, bob, unixFile0.txt, boston, text1, text2, text3
    OK: unixFile0.txt

    1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
    OK: unixFile1.txt

    2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
    OK: unixFile2.txt

    3, bob, usr/tst/unix.some.txt.File3.txt, boston, text1, text2, text3
    OK: unix.some.txt.File3.txt

    4, bob, usr/tst.txt/unixFile4.Txt, boston, text1, text2, text3
    OK: unixFile4.Txt
    , Jul 11, 2009
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page