how to read file from sub-directories and do an average?

Discussion in 'Perl Misc' started by Ross, Aug 19, 2005.

  1. Ross

    Ross Guest

    I have the following directories under a dir called raw data-1, and under
    each subdir, say, 4601-4.SMP, there is a single file under there. indeed
    that single file has a fixed format and i'm going to extract numerical
    values there to write to a new file along with two from 4601-4B.SMP and
    4601-4C.SMP. Since a user does not follow nomenclature strictly, sometimes
    he names a dir, say, 4601-4A.SMP instead of 4601-4.SMP, how could i achieve
    extracting 3 files and write to a single file? Finally i would find the
    average of the numerical values obtained from the 3 files.

    admin/home/admin> ls -al raw\ data-1/
    total 96
    drwxr-xr-x 23 admin admin 4096 Aug 20 03:59 .
    drwxr-xr-x 9 admin admin 4096 Aug 20 03:59 ..
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4594-3.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4594-3B.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4594-3C.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4601-4.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4601-4B.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4601-4C.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4605-5.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4605-5B.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4605-5C.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4612-2.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4612-2B.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4612-2C.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4614-6.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4614-6B.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4614-6C.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4618-1.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4618-1B.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4618-1C.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4620-1.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4620-1B.SMP
    drwxr-xr-x 2 admin admin 4096 Aug 20 03:59 4620-1C.SMP
     
    Ross, Aug 19, 2005
    #1
    1. Advertising

  2. Gunnar Hjalmarsson, Aug 19, 2005
    #2
    1. Advertising

  3. Ross

    Paul Lalli Guest

    Ross wrote:
    > I have the following directories under a dir called raw data-1, and under
    > each subdir, say, 4601-4.SMP, there is a single file under there. indeed
    > that single file has a fixed format and i'm going to extract numerical
    > values there to write to a new file along with two from 4601-4B.SMP and
    > 4601-4C.SMP. Since a user does not follow nomenclature strictly, sometimes
    > he names a dir, say, 4601-4A.SMP instead of 4601-4.SMP, how could i achieve
    > extracting 3 files and write to a single file? Finally i would find the
    > average of the numerical values obtained from the 3 files.


    What have you tried so far? What part of this do you need help with?

    opening and reading a directory?
    perldoc -f opendir
    perldoc -f readdir

    opening and reading a file?
    perldoc -f open
    perldoc perlopentut
    perldoc perlop ("I/O Operators")

    Saving the values in a variable?
    perldoc perldata

    Counting, summing, and averaging a value?
    perldoc perlop

    Make an attempt - preferably following the posting guidelines of this
    group - and if it doesn't do what you want, feel free to ask this group
    for help.

    Paul Lalli
     
    Paul Lalli, Aug 19, 2005
    #3
  4. Ross

    Ross Guest

    the problems are:

    1) my codes are clumsy,
    opendir(WORKINGDIR, $ARGV[0]) || die ("unable to open dir named $ARGV[0]");

    while ($inputdirname = readdir(WORKINGDIR)) {
    chdir ($ARGV[0]);

    opendir(SUBDIR, $inputdirname) || die ("unable to open dir named
    $inputdirname");

    $inputfilename = readdir(RUBDIR);
    chdir ($inputdirname);

    open(IN, $inputfilename) || die "Could not open $inputfilename\n";

    ...
    }

    2)don't know how to write a file into columns, i separately wrote a script
    which can process the input file into the following example tab-delimited
    format,

    "4594-3A"
    Concentration (mg/mL) Normalized Concentration
    ASP 15.9789 8.873
    THR 5.6596 3.143
    SER 27.4199 15.226
    GLU 23.0988 12.826
    PRO 7.0019 3.888
    GLY 10.2960 5.717
    ALA 33.3880 18.540
    CYS 1.9538 1.085
    VAL 6.6856 3.713
    MET 1.9792 1.099
    ILE 3.4556 1.919
    LEU 5.4778 3.041
    TYR 2.1671 1.204
    PHE 2.5160 1.397
    HIS 2.2561 1.253
    LYS 21.2256 11.786
    ARG 3.9567 2.197
    TOTALS 180.0864 100.000


    and the final averaged file should be in this way, as you can see, the data
    are added in a column fashion.

    ? 4594-3A 4594-3B 4594-3C average ? average (to 2d.p.)
    ASP 15.045 15.082 14.836 14.98767 ? ASP 14.99
    THR 2.626 2.577 2.595 2.599333 ? THR 2.60
    SER 19.276 19.543 19.499 19.43933 ? SER 19.44
    GLU 19.343 19.81 19.651 19.60133 ? GLU 19.60
    PRO 2.801 2.866 2.881 2.849333 ? PRO 2.85
    GLY 5.031 5.149 5.074 5.084667 ? GLY 5.08
    ALA 18.615 18.616 18.533 18.588 ? ALA 18.59
    CYS 0 0 0 0 ? CYS 0.00
    VAL 2.823 2.722 2.742 2.762333 ? VAL 2.76
    MET 1.05 0.785 0.836 0.890333 ? MET 0.89
    ILE 1.66 1.642 1.627 1.643 ? ILE 1.64
    LEU 2.8 2.534 2.571 2.635 ? LEU 2.64
    TYR 1.352 1.153 1.175 1.226667 ? TYR 1.23
    PHE 1.298 1.067 1.105 1.156667 ? PHE 1.16
    HIS 1.03 0.98 0.989 0.999667 ? HIS 1.00
    LYS 0.862 0.808 0.83 0.833333 ? LYS 0.83
    ARG 1.739 1.608 1.656 1.667667 ? ARG 1.67
     
    Ross, Aug 19, 2005
    #4
  5. Ross

    Guest

    "Ross" <> wrote:
    > I have the following directories under a dir called raw data-1, and under
    > each subdir, say, 4601-4.SMP, there is a single file under there. indeed
    > that single file has a fixed format and i'm going to extract numerical
    > values there to write to a new file along with two from 4601-4B.SMP and
    > 4601-4C.SMP. Since a user does not follow nomenclature strictly,
    > sometimes he names a dir, say, 4601-4A.SMP instead of 4601-4.SMP, how
    > could i achieve extracting 3 files and write to a single file?


    IMHO, you can't. Either a user does what he is supposed to, or he doesn't.
    If the user doesn't, then all you can do is guess. What if your user
    decides to not follow nomenclature stricly by sticking the A before the 2nd
    number segment rather than after it? What if he fails to follow
    nomenclature strictly by mis-spelling 4601-4A.SMP as 4603-4A.SMP? Either
    it is allowed to add a letter after the 2nd number segment, in which case
    the user *is* following nomenclature strictly, or we are just playing a
    guessing game.

    Any, I've handled similar situations something like:

    my %set;
    ##Find the set of all groups to be averaged over
    my @files=glob "*SMP";
    foreach (@files) {
    /^(\d+-\d+)/ or die "invalid format $_";
    $set{$_}=();
    };

    foreach my $group (keys %set) {
    foreach my $member ( fgrep /\Q$group\E\D/ , @files) {
    ##open $member and do whatever you do
    };
    ## do whatever output you need for the group
    };

    The \D in the regex makes sure that 4594-32.SMP is not considered part
    of the set "4594-3".

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Aug 19, 2005
    #5
  6. Ross

    Guest

    "Ross" <> wrote:
    > I have the following directories under a dir called raw data-1, and under
    > each subdir, say, 4601-4.SMP, there is a single file under there. indeed
    > that single file has a fixed format and i'm going to extract numerical
    > values there to write to a new file along with two from 4601-4B.SMP and
    > 4601-4C.SMP. Since a user does not follow nomenclature strictly,
    > sometimes he names a dir, say, 4601-4A.SMP instead of 4601-4.SMP, how
    > could i achieve extracting 3 files and write to a single file?


    IMHO, you can't. Either a user does what he is supposed to, or he doesn't.
    If the user doesn't, then all you can do is guess. What if your user
    decides to not follow nomenclature stricly by sticking the A before the 2nd
    number segment rather than after it? What if he fails to follow
    nomenclature strictly by mis-spelling 4601-4A.SMP as 4603-4A.SMP? Either
    it is allowed to add a letter after the 2nd number segment, in which case
    the user *is* following nomenclature strictly, or we are just playing a
    guessing game.

    Anyway, I've handled similar situations something like:

    my %set;
    ##Find the set of all groups to be averaged over
    my @files=glob "*SMP";
    foreach (@files) {
    /^(\d+-\d+)/ or die "invalid format $_";
    $set{$_}=();
    };

    foreach my $group (keys %set) {
    foreach my $member ( fgrep /^\Q$group\E\D/ , @files) {
    ##open $member and do whatever you do
    };
    ## do whatever output you need for the group
    };

    The \D in the regex makes sure that 4594-32.SMP is not considered part
    of the group "4594-3". (And the ^ makes sure that 594-3.SMP isn't
    considered part of the group 4594-3)

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Aug 19, 2005
    #6
  7. Tad McClellan wrote:

    > Ross <> wrote:
    >> chdir ($ARGV[0]);

    >
    >
    > You should probably ensure that it is an existing directory
    > before you try to open it:
    >
    > die "'$inputdirname' is not a directory" unless -d $inputdirname;


    I disagree. It's better to just try the chdir() and die if it fails.
    There are numerous reasons why this is better to do with race conditions
    and permissions.

    > Use copy/paste or your editor's "import" function rather than
    > attempting to type in your code. If you make a typo you will get
    > followups about your typos instead of about the question you are
    > trying to get answered.


    Fair point.

    > It is profoundly rude of you to bother thousands of people with
    > such silliness.
    >
    > Are you being rude on purpose? It sure looks like it.


    Tad, I think such strong statements are a little over the top for a
    first offence. (If this wasn't a first offence then they are justified).
     
    Brian McCauley, Aug 20, 2005
    #7
  8. Ross

    Ross Guest

    if i don't sort the dir names, it cannot guarantee to take average
    correctly, if so like using:

    $curdir = `pwd`;
    chop $curdir;

    $curdir = $curdir . "/$ARGV[0]";


    @inputdirname = readdir(WORKINGDIR);

    foreach $inputdirname (sort @inputdirname) {

    print $curdir.$inputdirname; <STDIN>;
    opendir(SUBDIR, $curdir.$inputdirname) || die ("unable to open dir
    named $inputdirname $!");

    <processing>

    chdir ($curdir);

    <counting>
    }

    .. and .. are taken into account
     
    Ross, Aug 20, 2005
    #8
  9. Brian McCauley <> wrote:
    > Tad McClellan wrote:
    >> Ross <> wrote:


    [code with typos]

    >> Use copy/paste or your editor's "import" function rather than
    >> attempting to type in your code.


    > Fair point.
    >
    >> It is profoundly rude of you to bother thousands of people with
    >> such silliness.
    > >
    >> Are you being rude on purpose? It sure looks like it.

    >
    > Tad, I think such strong statements are a little over the top for a
    > first offence. (If this wasn't a first offence then they are justified).



    I pointed out the futility of "paraphrased" code a month ago:

    Message-Id: <>

    I mentioned providing attributions twice before this third
    (unattributed) followup from the OP.

    Another poster pointed this OP to the posting guidelines over a month ago.

    And I get the feeling that clp.misc is tried before any other resource
    (rather than after all other resources.)


    I wouldn't go to assuming "on purpose" on a first offence either.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Aug 20, 2005
    #9
  10. Ross <> wrote:

    > $curdir = `pwd`;
    > chop $curdir;



    You should not use chop() to remove newlines.

    You should use chomp() to remove newlines.


    > @inputdirname = readdir(WORKINGDIR);
    > foreach $inputdirname (sort @inputdirname) {



    There is no need for a temporary array:

    foreach $inputdirname (sort readdir WORKINGDIR) {

    Or, since you should have "use strict" turned on by now:

    foreach my $inputdirname (sort readdir WORKINGDIR) {


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Aug 20, 2005
    #10
  11. Ross <> wrote:

    > $curdir = $curdir . "/$ARGV[0]";


    > opendir(SUBDIR, $curdir.$inputdirname) || die ("unable to open dir

    ^^^ no slash character?
    > named $inputdirname $!");



    Don't you need a directory separator character between $curdir
    and $inputdirname?

    Your diagnostic message is misleading, you should have the same name
    there as used in the opendir():

    opendir(SUBDIR, "$curdir/$inputdirname") or
    die "unable to open dir named '$curdir/$inputdirname' $!";


    > chdir ($curdir);



    You should check the return value to ensure that you actually
    got what you asked for:

    chdir $curdir or die "could not change to '$curdir' $!";


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Aug 20, 2005
    #11
  12. Tad McClellan wrote:
    > Brian McCauley <> wrote:
    >
    >>Tad McClellan wrote:
    >>>
    >>>Are you being rude on purpose? It sure looks like it.

    >>
    >>Tad, I think such strong statements are a little over the top for a
    >>first offence. (If this wasn't a first offence then they are justified).

    >
    > I wouldn't go to assuming "on purpose" on a first offence either.


    OK, I'm satisfied that your criticism of the OP was indeed justified.
    However we should avoid giving the new-commers the impression that the
    Perl community is an unfreindly and unforgiving place.

    Could I humbly suggest it would have been better to have said...

    This has been explained to you before. Are you being rude on purpose?
    It sure looks like it.
     
    Brian McCauley, Aug 20, 2005
    #12
  13. Ross

    Ross Guest

    Thanks Brian and sorry Tad. Still i can't quite get what you are talking
    about. Besides the typo, it seems there are some etiquettes of posting here,
    where should i find them? I once encountered this situation before in
    another newsgroup, is that every newsgroup having their rules so a newcomer
    had better check them up first? if so, where are they?
     
    Ross, Aug 21, 2005
    #13
  14. Ross

    Ross Guest

    "Tad McClellan" <> wrote in message
    news:...
    > Ross <> wrote:
    >
    >> $curdir = `pwd`;
    >> chop $curdir;

    >
    >
    > You should not use chop() to remove newlines.
    >
    > You should use chomp() to remove newlines.
    >

    Thanks for letting me know there is a better (in a sense that's what i want)
    function.
     
    Ross, Aug 21, 2005
    #14
  15. Ross

    Scott Bryce Guest

    Ross wrote:

    > it seems there are some etiquettes of posting here, where should i
    > find them?


    I haven't been following this thread, so I don't know if this has been
    explained to you.

    There are general rules for posting to newsgroups. Some newsgroups are
    more lax than others about the rules. In some newsgroups, particularly
    the bussier technical newsgroups like this one, you will be expected to
    play by the rules. It makes it easier for the people here to help you.

    Also, many newsgroups, this one included, have posting guidelines that
    you will be expected to follow. Tad posts them to this group about twice
    a week. You can also find them here:

    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html

    This newsgroup gets a lot of traffic. There are some very knowledgeable
    people here who devote a lot of time to helping others. People who don't
    follow the guidelines take up more of other people's time, and will
    eventually get less and less help.

    > I once encountered this situation before in another newsgroup, is
    > that every newsgroup having their rules so a newcomer had better
    > check them up first? if so, where are they?


    Every newsgroup is different. It is a good idea to read a couple of
    week's worth of posts before you post to a newsgroup for the first time.
    That will help you get a feel for what is expected. As you are reading,
    look for a link to posting guidelines or a FAQ.
     
    Scott Bryce, Aug 21, 2005
    #15
  16. Ross

    Ross Guest

    Is there any built-in function/parameters in Perl not to take . and .. into
    account when opening all the subdirectories?

    When i run the code:

    the error appears:

    unable to open dir named /home/sunlab/AAA/Reb/rawdat/4601-4.SMP No such file
    or directory at <the absolute path for this perl>/SMP2XLSAVG2.pl line 42

    <the absolute path for this perl> is replaced by me.

    indeed when ls -al rawdat

    drwxr-xr-x 2 sunlab 4096 Aug 21 12:25 4601-4.SMP/


    I've tried both the with and without slash at the end versions.
     
    Ross, Aug 21, 2005
    #16
  17. Ross

    Ross Guest

    "Scott Bryce" <> wrote in message
    news:...
    > Ross wrote:
    >
    >> it seems there are some etiquettes of posting here, where should i
    >> find them?

    >
    > I haven't been following this thread, so I don't know if this has been
    > explained to you.
    >
    > There are general rules for posting to newsgroups. Some newsgroups are
    > more lax than others about the rules. In some newsgroups, particularly
    > the bussier technical newsgroups like this one, you will be expected to
    > play by the rules. It makes it easier for the people here to help you.
    >
    > Also, many newsgroups, this one included, have posting guidelines that
    > you will be expected to follow. Tad posts them to this group about twice
    > a week. You can also find them here:
    >
    > http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    >
    > This newsgroup gets a lot of traffic. There are some very knowledgeable
    > people here who devote a lot of time to helping others. People who don't
    > follow the guidelines take up more of other people's time, and will
    > eventually get less and less help.
    >
    >> I once encountered this situation before in another newsgroup, is
    >> that every newsgroup having their rules so a newcomer had better
    >> check them up first? if so, where are they?

    >
    > Every newsgroup is different. It is a good idea to read a couple of week's
    > worth of posts before you post to a newsgroup for the first time. That
    > will help you get a feel for what is expected. As you are reading, look
    > for a link to posting guidelines or a FAQ.


    oh, i traced past messages and find out a term "attribution", now i
    understand what it means and thanks for directing me to the link
     
    Ross, Aug 21, 2005
    #17
  18. Ross <> wrote:

    > Is there any built-in function/parameters in Perl not to take . and .. into
    > account when opening all the subdirectories?



    No, but there _is_ a way to avoid processing them. :)

    while ( my $item = readdir DIR ) {

    next if $item eq '.' or $item eq '..';
    # next if $item /^\./; # skip ALL items that start with dot

    # process non-dot files here
    }


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Aug 21, 2005
    #18
  19. Ross

    Anno Siegel Guest

    Brian McCauley <> wrote in comp.lang.perl.misc:
    >
    >
    > Tad McClellan wrote:
    >
    > > Ross <> wrote:
    > >> chdir ($ARGV[0]);

    > >
    > >
    > > You should probably ensure that it is an existing directory
    > > before you try to open it:
    > >
    > > die "'$inputdirname' is not a directory" unless -d $inputdirname;

    >
    > I disagree. It's better to just try the chdir() and die if it fails.
    > There are numerous reasons why this is better to do with race conditions
    > and permissions.


    Apart from that, $inputdirname is coming directly out of a readdir().
    The possibility that chdir($inputdirname) fails because $inputdirname
    isn't a directory is remote.

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Aug 23, 2005
    #19
  20. Ross

    Anno Siegel Guest

    Tad McClellan <> wrote in comp.lang.perl.misc:
    > Ross <> wrote:
    >
    > > Is there any built-in function/parameters in Perl not to take . and .. into
    > > account when opening all the subdirectories?

    >
    >
    > No, but there _is_ a way to avoid processing them. :)
    >
    > while ( my $item = readdir DIR ) {
    >
    > next if $item eq '.' or $item eq '..';
    > # next if $item /^\./; # skip ALL items that start with dot
    >
    > # process non-dot files here
    > }


    File::Spec can even do that portably (untested):

    use File::Spec qw( no_upwards);
    for my $item ( no_upwards readdir DIR ) {
    # no "." and ".." here
    }

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Aug 23, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Joel Finkel
    Replies:
    0
    Views:
    508
    Joel Finkel
    Sep 12, 2003
  2. could ildg
    Replies:
    0
    Views:
    312
    could ildg
    Jun 28, 2005
  3. Ben
    Replies:
    2
    Views:
    938
  4. Lawrence D'Oliveiro

    Death To Sub-Sub-Sub-Directories!

    Lawrence D'Oliveiro, May 5, 2011, in forum: Java
    Replies:
    92
    Views:
    2,132
    Lawrence D'Oliveiro
    May 20, 2011
  5. Ron Smith
    Replies:
    5
    Views:
    161
    Michele Dondi
    Nov 2, 2004
Loading...

Share This Page