A script to separate out file names from the path?

Discussion in 'Perl Misc' started by Rich Grise, Dec 11, 2006.

  1. Rich Grise

    Rich Grise Guest

    I have a collection of about 6000 files that need to be reorganized.
    These have been strewn all over the place, from CDs to various partitions
    and subdirectories on different workstations, to a pile of various
    subdirectories from our Samba server, and what-not.

    They're all on different depths of subdir, and I'm almost certain that
    there's a lot of redundancy - I've got a list that looks something like
    this example:

    /Collection/a/b/c/d/file1
    /Collection/a/b/c/d/file2
    /Collection/a/b/c/d/file3
    /Collection/a/b/c/d/file4
    /Collection/a/b/c/e/file4
    /Collection/a/b/c/e/file5
    /Collection/e/f/g/file4
    /Collection/e/f/g/file5
    /Collection/e/f/g/file6
    /Collection/e/f/g/file7

    and so on; as you can see, they're at different subdir depths;
    what I want to do, if possible, is to take this array, split out
    only the last component (after some unknown number of '/', but
    the last one in the string), put it in the front of a new
    string, then concatenate the original line;

    The ultimate goal is to sort these by filename - I could kill
    a lot of reduncancy pretty easy that way.

    But it turns out, what I've been trying to do is use
    for (<>) {
    my @line = split(/\//,$_);
    my $count = @line;
    print (@line[$count-1], " : ", $_);
    }

    doesn't seem to accomplish what I think it should. Here's the
    script I've got so far:

    #!/usr/bin/perl

    while (<>) {
    $input = chop($_);
    @line = split(/\//,$input);
    $count = @line;
    print ("count = ", $count, "\n");

    # foreach $item(@line) {
    # print (" item = ", $item);
    # }
    # print ("count = ", $count, " ");

    # for ($i = 0; $i < $count; $i++) {
    # print (" item ", $i, " = ", @line[$i], " ");
    # }

    # $myitem = @line[$count-1];

    # print (@line[$count-1]);

    # print ": ";
    # print $input;
    # print "\n";
    }


    As you can seem I've tried variations on this, and nothing I've
    tried yet has done what I want.

    Here's the input (example):

    /Collection/a/b/c/d/file1
    /Collection/a/b/c/d/file2
    /Collection/a/b/c/d/file3
    /Collection/a/b/c/d/file4
    /Collection/a/b/c/e/file4
    /Collection/a/b/c/e/file5
    /Collection/e/f/g/file4
    /Collection/e/f/g/file5
    /Collection/e/f/g/file6
    /Collection/e/f/g/file7

    And here's what I want the output to look like:

    file1 : /Collection/a/b/c/d/file1
    file2 : /Collection/a/b/c/d/file2
    file3 : /Collection/a/b/c/d/file3
    file4 : /Collection/a/b/c/d/file4
    file4 : /Collection/a/b/c/e/file4
    file5 : /Collection/a/b/c/e/file5
    file4 : /Collection/e/f/g/file4
    file5 : /Collection/e/f/g/file5
    file6 : /Collection/e/f/g/file6
    file7 : /Collection/e/f/g/file7

    Which I could sort, and track down the duplicates.

    But I'm stuck on rearranging the strings. )-;

    Would anyone wish to be so kind as to volunteer to do my homework for me?

    Thanks,
    Rich
    Rich Grise, Dec 11, 2006
    #1
    1. Advertising

  2. Rich Grise

    Guest

    Rich Grise wrote:
    > A script to separate out file names from the path?


    The module File::Basename is part of your standard Perl distribution.

    --
    The best way to get a good answer is to ask a good question.
    David Filmer (http://DavidFilmer.com)
    , Dec 11, 2006
    #2
    1. Advertising

  3. Rich Grise

    J. Gleixner Guest

    Rich Grise wrote:
    [...]
    > The ultimate goal is to sort these by filename - I could kill
    > a lot of reduncancy pretty easy that way.
    >
    > But it turns out, what I've been trying to do is use
    > for (<>) {
    > my @line = split(/\//,$_);
    > my $count = @line;
    > print (@line[$count-1], " : ", $_);
    > }


    You can use a negative index.

    my @arr = qw(a b c d e);
    print $arr[-1];

    Will print: e

    Note: It's $line[] not @line[].

    And since split returns a list, you could get the last item:

    my $last_item = ( split /\// ) [-1];


    > Would anyone wish to be so kind as to volunteer to do my homework for me?

    No, however most people will help you learn the language so you can do
    it yourself.
    J. Gleixner, Dec 11, 2006
    #3
  4. Rich Grise

    Lew Pitcher Guest

    Rich Grise wrote:
    > I have a collection of about 6000 files that need to be reorganized.
    > These have been strewn all over the place, from CDs to various partitions
    > and subdirectories on different workstations, to a pile of various
    > subdirectories from our Samba server, and what-not.
    >
    > They're all on different depths of subdir, and I'm almost certain that
    > there's a lot of redundancy - I've got a list that looks something like
    > this example:
    >
    > /Collection/a/b/c/d/file1
    > /Collection/a/b/c/d/file2
    > /Collection/a/b/c/d/file3
    > /Collection/a/b/c/d/file4
    > /Collection/a/b/c/e/file4
    > /Collection/a/b/c/e/file5
    > /Collection/e/f/g/file4
    > /Collection/e/f/g/file5
    > /Collection/e/f/g/file6
    > /Collection/e/f/g/file7
    >
    > and so on; as you can see, they're at different subdir depths;
    > what I want to do, if possible, is to take this array, split out
    > only the last component (after some unknown number of '/', but
    > the last one in the string), put it in the front of a new
    > string, then concatenate the original line;
    >
    > The ultimate goal is to sort these by filename - I could kill
    > a lot of reduncancy pretty easy that way.
    >
    > But it turns out, what I've been trying to do is use
    > for (<>) {
    > my @line = split(/\//,$_);
    > my $count = @line;
    > print (@line[$count-1], " : ", $_);
    > }
    >
    > doesn't seem to accomplish what I think it should. Here's the
    > script I've got so far:

    [snip]

    I say why use complex tools when simple tools will suffice

    Have you looked at the basename(1) and dirname(1) utilities?

    lpitcher@merlin:~$ basename /Collection/a/b/c/d/file1.a
    file1.a
    lpitcher@merlin:~$ basename /Collection/a/b/c/d/file1
    file1

    lpitcher@merlin:~$ dirname /Collection/a/b/c/d/file1.a
    /Collection/a/b/c/d
    lpitcher@merlin:~$ dirname /Collection/a/b/c/d/file1
    /Collection/a/b/c/d

    Something as simple as

    #!/bin/bash
    echo `basename $1`: $1

    might do the trick

    HTH
    --
    Lew
    Lew Pitcher, Dec 11, 2006
    #4
  5. Rich Grise wrote:
    > I have a collection of about 6000 files that need to be reorganized.
    > These have been strewn all over the place, from CDs to various partitions
    > and subdirectories on different workstations, to a pile of various
    > subdirectories from our Samba server, and what-not.
    >
    > They're all on different depths of subdir, and I'm almost certain that
    > there's a lot of redundancy - I've got a list that looks something like
    > this example:
    >
    > /Collection/a/b/c/d/file1
    > /Collection/a/b/c/d/file2
    > /Collection/a/b/c/d/file3
    > /Collection/a/b/c/d/file4
    > /Collection/a/b/c/e/file4
    > /Collection/a/b/c/e/file5
    > /Collection/e/f/g/file4
    > /Collection/e/f/g/file5
    > /Collection/e/f/g/file6
    > /Collection/e/f/g/file7
    >
    > and so on; as you can see, they're at different subdir depths;
    > what I want to do, if possible, is to take this array, split out
    > only the last component (after some unknown number of '/', but
    > the last one in the string), put it in the front of a new
    > string, then concatenate the original line;
    >
    > The ultimate goal is to sort these by filename - I could kill
    > a lot of reduncancy pretty easy that way.
    >
    > But it turns out, what I've been trying to do is use
    > for (<>) {
    > my @line = split(/\//,$_);
    > my $count = @line;
    > print (@line[$count-1], " : ", $_);


    You are using an array slice when you should be using a scalar:

    Found in /usr/lib/perl5/5.8.6/pod/perlfaq4.pod
    What is the difference between $array[1] and @array[1]?

    And you can use negative numbers to index from the end of the array:

    print "$line[-1] : $_";


    > }
    >
    > doesn't seem to accomplish what I think it should. Here's the
    > script I've got so far:
    >
    > #!/usr/bin/perl


    use warnings;
    use strict;

    > while (<>) {
    > $input = chop($_);


    You should use chomp instead of chop.

    > @line = split(/\//,$input);
    > $count = @line;
    > print ("count = ", $count, "\n");
    >
    > # foreach $item(@line) {
    > # print (" item = ", $item);
    > # }
    > # print ("count = ", $count, " ");
    >
    > # for ($i = 0; $i < $count; $i++) {
    > # print (" item ", $i, " = ", @line[$i], " ");
    > # }
    >
    > # $myitem = @line[$count-1];
    >
    > # print (@line[$count-1]);
    >
    > # print ": ";
    > # print $input;
    > # print "\n";
    > }



    #!/usr/bin/perl
    use warnings;
    use strict;

    use File::Basename;

    print map /\0(.+)/s,
    sort
    map basename( $_ ) . "\0$_",
    <>;

    __END__




    John
    --
    Perl isn't a toolbox, but a small machine shop where you can special-order
    certain sorts of tools at low cost and in short order. -- Larry Wall
    John W. Krahn, Dec 11, 2006
    #5
  6. Rich Grise

    Paul Lalli Guest

    Rich Grise wrote:
    > for (<>) {
    > my @line = split(/\//,$_);
    > my $count = @line;
    > print (@line[$count-1], " : ", $_);
    > }
    >
    > doesn't seem to accomplish what I think it should.


    No, that would have worked perfectly well. It's just not at all what
    you did.

    > Here's the
    > script I've got so far:
    >
    > #!/usr/bin/perl
    >
    > while (<>) {
    > $input = chop($_);


    perldoc -f chop
    chop VARIABLE
    chop( LIST )
    chop Chops off the last character of a string and returns
    the character chopped.

    Did you bother printing $index to see what it was? It's not the line
    minus the trailing newline. It's the trailing newline.

    You should be using chomp anyway.

    while (my $input = <>) {
    chomp $input;
    #etc
    }

    Regardless, use File::Basename, as another responder suggested. This
    wheel has already been written.

    Paul Lalli
    Paul Lalli, Dec 11, 2006
    #6
  7. Rich Grise

    Uri Guttman Guest

    >>>>> "LP" == Lew Pitcher <> writes:

    LP> I say why use complex tools when simple tools will suffice

    LP> Have you looked at the basename(1) and dirname(1) utilities?

    i say why use external shell commands when File::Basename is a core
    module?

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Dec 11, 2006
    #7
  8. Rich Grise

    Rich Grise Guest

    On Mon, 11 Dec 2006 12:19:55 -0800, usenet wrote:

    > Rich Grise wrote:
    >> A script to separate out file names from the path?

    >
    > The module File::Basename is part of your standard Perl distribution.


    Sorry for the bother - I just did it the old way in C, which I know is
    heresy for the perl group. =:-O

    /* relist.c */
    /* reformats strings. */

    #include <stdio.h>

    char buffer[512];
    char * bufp;

    int main() {
    while (bufp = gets(buffer)) {
    bufp = strrchr(buffer, '/');
    printf ("item ID = %s, data = %s\n", bufp + 1, buffer);
    }
    }

    Thanks!
    Rich
    Rich Grise, Dec 11, 2006
    #8
  9. Rich Grise

    Dr.Ruud Guest

    Rich Grise schreef:

    > #include <stdio.h>
    >
    > char buffer[512];
    > char * bufp;
    >
    > int main() {
    > while (bufp = gets(buffer)) {
    > bufp = strrchr(buffer, '/');
    > printf ("item ID = %s, data = %s\n", bufp + 1, buffer);
    > }
    > }


    Perl version:

    while ( <> =~ m~(.+/(.+))~ ) {
    printf "item ID = %s, data = %s\n", $2, $1 ;
    }

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, Dec 11, 2006
    #9
  10. ["Followup-To:" header set to comp.lang.perl.misc.]

    Rich Grise <> wrote:


    > Here's the input (example):
    >
    > /Collection/a/b/c/d/file1
    > /Collection/a/b/c/d/file2
    > /Collection/a/b/c/d/file3
    > /Collection/a/b/c/d/file4
    > /Collection/a/b/c/e/file4
    > /Collection/a/b/c/e/file5
    > /Collection/e/f/g/file4
    > /Collection/e/f/g/file5
    > /Collection/e/f/g/file6
    > /Collection/e/f/g/file7
    >
    > And here's what I want the output to look like:
    >
    > file1 : /Collection/a/b/c/d/file1
    > file2 : /Collection/a/b/c/d/file2
    > file3 : /Collection/a/b/c/d/file3
    > file4 : /Collection/a/b/c/d/file4
    > file4 : /Collection/a/b/c/e/file4
    > file5 : /Collection/a/b/c/e/file5
    > file4 : /Collection/e/f/g/file4
    > file5 : /Collection/e/f/g/file5
    > file6 : /Collection/e/f/g/file6
    > file7 : /Collection/e/f/g/file7



    perl -pe 's/(.*\/(.*))/$2 : $1/' input.file


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Dec 12, 2006
    #10
  11. Rich Grise

    Ted Zlatanov Guest

    On 11 Dec 2006, wrote:

    On Mon, 11 Dec 2006 12:19:55 -0800, usenet wrote: > Rich Grise wrote:
    >>> A script to separate out file names from the path?

    >>
    >> The module File::Basename is part of your standard Perl distribution.

    >
    > Sorry for the bother - I just did it the old way in C, which I know is
    > heresy for the perl group. =:-O
    >
    > /* relist.c */
    > /* reformats strings. */
    >
    > #include <stdio.h>
    >
    > char buffer[512];
    > char * bufp;
    >
    > int main() {
    > while (bufp = gets(buffer)) {
    > bufp = strrchr(buffer, '/');
    > printf ("item ID = %s, data = %s\n", bufp + 1, buffer);
    > }
    > }


    It's not heresy, just not interesting--most of us have written C and
    much prefer Perl. Also, you shouldn't use gets(). Ever. Henry
    Spencer explains it better than I could:

    http://isthe.com/chongo/tech/comp/c/10com.html

    Ted
    Ted Zlatanov, Dec 12, 2006
    #11
  12. Rich Grise

    Ted Zlatanov Guest

    On 11 Dec 2006, wrote:

    > I say why use complex tools when simple tools will suffice


    Excellent point. But also, you have to know the complex ways in which
    simple tools can fail.

    > Something as simple as
    >
    > #!/bin/bash
    > echo `basename $1`: $1


    # touch 'a b'

    # cat b.sh
    #!/bin/bash
    echo `basename $1`: $1

    # ./b.sh 'a b'
    a: a b

    You need the second line to be

    echo `basename "$1"`: $1

    and even that may have trouble on systems like Windows that don't have
    a `basename' program available by default.

    Ted
    Ted Zlatanov, Dec 12, 2006
    #12
  13. Rich Grise

    Rich Grise Guest

    Thanks All! was: Re: A script to separate out file names from the path?

    On Mon, 11 Dec 2006 20:15:53 +0000, Rich Grise wrote:

    > I have a collection of about 6000 files that need to be reorganized.


    After reading the wealth of responses, I decided to go ahead and break
    form and reply to myself, because I want to say thank you to each and
    every one of you, but I'm too lazy to type this six times. ;-)

    Thanks!
    Rich
    Rich Grise, Dec 20, 2006
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page