Read string from multiple files, output ordered by a 2nd file

Discussion in 'Perl Misc' started by Scott Bass, Mar 21, 2005.

  1. Scott Bass

    Scott Bass Guest

    Hi,

    Say I have 100 files, f1 - f100. Say the first line is "Table 1.1.1 <a
    bunch of whitespace> Page 1 of x"

    What I need perl to spit out is:

    f1 /* Table 1.1.1 */
    f2 /* Table 1.1.2 */
    f3 /* Table 1.1.3.1.5 */

    etc.

    In pseudocode: "take columns 1-30 from line 1 from 100 separate files, and
    spit out the filename, two tabs, slash asterisk, the text from columns 1-30,
    asterisk slash"

    However, I would also like the output sorted by the data in a 2nd file. So,
    if that 2nd file is:

    Table 1.1
    Table 1.5
    Table 2.1
    Table 3.1
    Table 1.2
    Table 2.2
    Table 3.2
    etc.

    then I would like the output sorted by the order as found in that 2nd file.

    Any ideas?

    Thanks,
    Scott
     
    Scott Bass, Mar 21, 2005
    #1
    1. Advertising

  2. Scott Bass wrote:
    > Say I have 100 files, f1 - f100. Say the first line is "Table 1.1.1 <a
    > bunch of whitespace> Page 1 of x"
    >
    > What I need perl to spit out is:
    >
    > f1 /* Table 1.1.1 */
    > f2 /* Table 1.1.2 */
    > f3 /* Table 1.1.3.1.5 */
    >
    > etc.
    >
    > In pseudocode: "take columns 1-30 from line 1 from 100 separate files, and
    > spit out the filename, two tabs, slash asterisk, the text from columns 1-30,
    > asterisk slash"
    >
    > However, I would also like the output sorted by the data in a 2nd file. So,
    > if that 2nd file is:
    >
    > Table 1.1
    > Table 1.5
    > Table 2.1
    > Table 3.1
    > Table 1.2
    > Table 2.2
    > Table 3.2
    > etc.
    >
    > then I would like the output sorted by the order as found in that 2nd file.
    >
    > Any ideas?


    You could write a program that does it.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Mar 21, 2005
    #2
    1. Advertising

  3. Scott Bass <> wrote:
    > Hi,
    >
    > Say I have 100 files, f1 - f100. Say the first line is "Table 1.1.1 <a
    > bunch of whitespace> Page 1 of x"
    >
    > What I need perl to spit out is:
    >
    > f1 /* Table 1.1.1 */
    > f2 /* Table 1.1.2 */
    > f3 /* Table 1.1.3.1.5 */
    >
    > etc.
    >
    > In pseudocode: "take columns 1-30 from line 1 from 100 separate files, and
    > spit out the filename, two tabs, slash asterisk, the text from columns 1-30,
    > asterisk slash"
    >
    > However, I would also like the output sorted by the data in a 2nd file. So,
    > if that 2nd file is:
    >
    > Table 1.1
    > Table 1.5
    > Table 2.1
    > Table 3.1
    > Table 1.2
    > Table 2.2
    > Table 3.2
    > etc.
    >
    > then I would like the output sorted by the order as found in that 2nd file.
    >
    > Any ideas?



    Load the 2nd file into a hash:

    $order{'Table 1.1'} = 0;
    $order{'Table 1.5'} = 1;
    ...

    then sort based on the hash values:

    sub in_specified_order { # untested of course
    my($atable) = $a =~ /(Table \d+\.\d+)/;
    my($btable) = $b =~ /(Table \d+\.\d+)/;
    $order{$atable} <=> $order{$btable};
    }


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Mar 21, 2005
    #3
  4. * Scott Bass schrieb:

    > Hi,
    >
    > Say I have 100 files, f1 - f100. Say the first line is "Table 1.1.1 <a
    > bunch of whitespace> Page 1 of x"
    >
    > What I need perl to spit out is:
    >
    > f1 /* Table 1.1.1 */
    > f2 /* Table 1.1.2 */
    > f3 /* Table 1.1.3.1.5 */
    >
    > etc.
    >
    > In pseudocode: "take columns 1-30 from line 1 from 100 separate files, and
    > spit out the filename, two tabs, slash asterisk, the text from columns 1-30,
    > asterisk slash"


    In your example, there is a blank between "/*" and "Table" ;)


    my @array;
    for my $file ( glob 'f*' ) {
    local $/ = \30; # read in chunks of 30 chars
    open my $fh, '<', $file or warn( $! ), next;
    my $chunk = <$fh>;
    push @array, "$file\t\t/* $chunk */";
    }


    Beware of lines shorter than 30 chars because the newline will appear in
    $chunk then. Perhaps you should forget your pseudocode above and use
    something more reliable based on (but that depends on your input data):


    my @array;
    for my $file ( glob 'f*' ) {
    open my $fh, '<', $file or warn( $! ), next;
    my $line = <$fh>;
    my( $chunk ) = $line =~ m/(Table (\d+\.)*\d+)/;
    push @array, sprintf "$file\t\t/* %-30s */", $chunk;
    }


    >
    > However, I would also like the output sorted by the data in a 2nd file. So,
    > if that 2nd file is:
    >
    > Table 1.1
    > Table 1.5
    > Table 2.1
    > Table 3.1
    > Table 1.2
    > Table 2.2
    > Table 3.2
    > etc.
    >
    > then I would like the output sorted by the order as found in that 2nd file.


    Take Tad's solution from this thread, but I'd change the regexp in his
    sorting routine to

    /(Table (\d+\.)*\d+)/

    to match on those items with more than one dot too.

    regards,
    fabian
     
    Fabian Pilkowski, Mar 21, 2005
    #4
  5. Scott Bass

    Big and Blue Guest

    Scott Bass wrote:

    >
    > In pseudocode: "take columns 1-30 from line 1 from 100 separate files, and
    > spit out the filename, two tabs, slash asterisk, the text from columns 1-30,
    > asterisk slash"


    That's very easy to translate to actual Perl code. Something like:

    # Get all line 1 cols 1-30, tagged with filename
    my @data;
    while (<>) {
    push @data, [ $ARGV, substr($_, 0, 30) ];
    close ARGV;
    }

    Now just print it out in the 2 required formats. OK - that sort you want
    for part2 is *slightly* tricky. Here's one I wrote earlier which you can
    adapt as required...


    ========================


    ##################################################################
    # Compare 2 strings with (possibly) alternating numeric and text parts,
    # e.g., 21beta2
    # The comparison is done the alternating parts in turn and stops when an
    # inequality is found.
    # '-' and '_' are ignored in the text-comparing part, so that 21beta3 is
    # more than 21beta_2.
    #
    sub _icmp($$) {
    my ($lhs, $rhs) = @_;

    my $res;
    while (length($lhs) or length($rhs)) {
    (my $lhs_num, $lhs) = ($lhs =~ /(\d*)(.*)/);
    (my $rhs_num, $rhs) = ($rhs =~ /(\d*)(.*)/);
    my $num_diff = ($lhs_num <=> $rhs_num);
    return $num_diff if ($num_diff);

    (my $lhs_chr, $lhs) = ($lhs =~ /([^\d]*)(.*)/);
    $lhs_chr =~ tr/-_//d;
    (my $rhs_chr, $rhs) = ($rhs =~ /([^\d]*)(.*)/);
    $rhs_chr =~ tr/-_//d;

    my $chr_diff = ($lhs_chr cmp $rhs_chr);
    return $chr_diff if ($chr_diff);
    }
    return 0;
    }

    ##################################################################
    # Compare 2 version strings - part by part.
    # This uses _icmp to compare sub-parts, so "allows" for textual parts.
    # If the optional "sloppy" arg is set then extra parts are ignored, so
    # that, e.g., 20.2 is equal to 20.2.1.0.1
    #
    sub _vcomp($$;$) {
    my @va = split(/\./, shift);
    my @vb = split(/\./, shift);
    my $sloppy = shift || 0; # If sloppy, ignore extra sub-versions

    # We need to know the shorter one and only compare to that length.
    #
    my $both_len = (@va < @vb)? @va: @vb;
    for (my $i = 0; $i < $both_len; $i++) {
    my $diff = _icmp($va[$i], $vb[$i]);
    return $diff if ($diff);
    }
    # If we get here with equality then we check for any remaining parts
    #
    return ($sloppy? 0: (@va <=> @vb));
    }


    --
    Just because I've written it doesn't mean that
    either you or I have to believe it.
     
    Big and Blue, Mar 21, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    956
    M.E.Farmer
    Feb 13, 2005
  2. Replies:
    8
    Views:
    662
    Thorsten Kampe
    Jul 13, 2006
  3. Simon Morgan

    Ordered output without "sorting" array

    Simon Morgan, Jul 26, 2005, in forum: C Programming
    Replies:
    2
    Views:
    294
    Michael Mair
    Jul 26, 2005
  4. DL

    Ordered list inside ordered list

    DL, Nov 9, 2009, in forum: Javascript
    Replies:
    6
    Views:
    327
    Dr J R Stockton
    Nov 21, 2009
  5. Replies:
    3
    Views:
    357
    Rick Johnson
    Feb 28, 2013
Loading...

Share This Page