Read string from multiple files, output ordered by a 2nd file

Scott Bass · Mar 21, 2005

Hi,

Say I have 100 files, f1 - f100. Say the first line is "Table 1.1.1 <a
bunch of whitespace> Page 1 of x"

What I need perl to spit out is:

f1 /* Table 1.1.1 */
f2 /* Table 1.1.2 */
f3 /* Table 1.1.3.1.5 */

etc.

In pseudocode: "take columns 1-30 from line 1 from 100 separate files, and
spit out the filename, two tabs, slash asterisk, the text from columns 1-30,
asterisk slash"

However, I would also like the output sorted by the data in a 2nd file. So,
if that 2nd file is:

Table 1.1
Table 1.5
Table 2.1
Table 3.1
Table 1.2
Table 2.2
Table 3.2
etc.

then I would like the output sorted by the order as found in that 2nd file.

Any ideas?

Thanks,
Scott

Gunnar Hjalmarsson · Mar 21, 2005

Scott said:
Say I have 100 files, f1 - f100. Say the first line is "Table 1.1.1 <a
bunch of whitespace> Page 1 of x"

What I need perl to spit out is:

f1 /* Table 1.1.1 */
f2 /* Table 1.1.2 */
f3 /* Table 1.1.3.1.5 */

etc.

In pseudocode: "take columns 1-30 from line 1 from 100 separate files, and
spit out the filename, two tabs, slash asterisk, the text from columns 1-30,
asterisk slash"

However, I would also like the output sorted by the data in a 2nd file. So,
if that 2nd file is:

Table 1.1
Table 1.5
Table 2.1
Table 3.1
Table 1.2
Table 2.2
Table 3.2
etc.

then I would like the output sorted by the order as found in that 2nd file.

Any ideas?

You could write a program that does it.

Tad McClellan · Mar 21, 2005

Scott Bass said:
Hi,

Say I have 100 files, f1 - f100. Say the first line is "Table 1.1.1 <a
bunch of whitespace> Page 1 of x"

What I need perl to spit out is:

f1 /* Table 1.1.1 */
f2 /* Table 1.1.2 */
f3 /* Table 1.1.3.1.5 */

etc.

In pseudocode: "take columns 1-30 from line 1 from 100 separate files, and
spit out the filename, two tabs, slash asterisk, the text from columns 1-30,
asterisk slash"

However, I would also like the output sorted by the data in a 2nd file. So,
if that 2nd file is:

Table 1.1
Table 1.5
Table 2.1
Table 3.1
Table 1.2
Table 2.2
Table 3.2
etc.

then I would like the output sorted by the order as found in that 2nd file.

Any ideas?

Load the 2nd file into a hash:

$order{'Table 1.1'} = 0;
$order{'Table 1.5'} = 1;
...

then sort based on the hash values:

sub in_specified_order { # untested of course
my($atable) = $a =~ /(Table \d+\.\d+)/;
my($btable) = $b =~ /(Table \d+\.\d+)/;
$order{$atable} <=> $order{$btable};
}

Fabian Pilkowski · Mar 21, 2005

* Scott Bass said:
Hi,

Say I have 100 files, f1 - f100. Say the first line is "Table 1.1.1 <a
bunch of whitespace> Page 1 of x"

What I need perl to spit out is:

f1 /* Table 1.1.1 */
f2 /* Table 1.1.2 */
f3 /* Table 1.1.3.1.5 */

etc.

In pseudocode: "take columns 1-30 from line 1 from 100 separate files, and
spit out the filename, two tabs, slash asterisk, the text from columns 1-30,
asterisk slash"

In your example, there is a blank between "/*" and "Table"

my @array;
for my $file ( glob 'f*' ) {
local $/ = \30; # read in chunks of 30 chars
open my $fh, '<', $file or warn( $! ), next;
my $chunk = <$fh>;
push @array, "$file\t\t/* $chunk */";
}

Beware of lines shorter than 30 chars because the newline will appear in
$chunk then. Perhaps you should forget your pseudocode above and use
something more reliable based on (but that depends on your input data):

my @array;
for my $file ( glob 'f*' ) {
open my $fh, '<', $file or warn( $! ), next;
my $line = <$fh>;
my( $chunk ) = $line =~ m/(Table (\d+\.)*\d+)/;
push @array, sprintf "$file\t\t/* %-30s */", $chunk;
}

However, I would also like the output sorted by the data in a 2nd file. So,
if that 2nd file is:

Table 1.1
Table 1.5
Table 2.1
Table 3.1
Table 1.2
Table 2.2
Table 3.2
etc.

then I would like the output sorted by the order as found in that 2nd file.

Take Tad's solution from this thread, but I'd change the regexp in his
sorting routine to

/(Table (\d+\.)*\d+)/

to match on those items with more than one dot too.

regards,
fabian

Big and Blue · Mar 21, 2005

Scott said:
In pseudocode: "take columns 1-30 from line 1 from 100 separate files, and
spit out the filename, two tabs, slash asterisk, the text from columns 1-30,
asterisk slash"

That's very easy to translate to actual Perl code. Something like:

# Get all line 1 cols 1-30, tagged with filename
my @data;
while (<>) {
push @data, [ $ARGV, substr($_, 0, 30) ];
close ARGV;
}

Now just print it out in the 2 required formats. OK - that sort you want
for part2 is *slightly* tricky. Here's one I wrote earlier which you can
adapt as required...

========================

##################################################################
# Compare 2 strings with (possibly) alternating numeric and text parts,
# e.g., 21beta2
# The comparison is done the alternating parts in turn and stops when an
# inequality is found.
# '-' and '_' are ignored in the text-comparing part, so that 21beta3 is
# more than 21beta_2.
#
sub _icmp($$) {
my ($lhs, $rhs) = @_;

my $res;
while (length($lhs) or length($rhs)) {
(my $lhs_num, $lhs) = ($lhs =~ /(\d*)(.*)/);
(my $rhs_num, $rhs) = ($rhs =~ /(\d*)(.*)/);
my $num_diff = ($lhs_num <=> $rhs_num);
return $num_diff if ($num_diff);

(my $lhs_chr, $lhs) = ($lhs =~ /([^\d]*)(.*)/);
$lhs_chr =~ tr/-_//d;
(my $rhs_chr, $rhs) = ($rhs =~ /([^\d]*)(.*)/);
$rhs_chr =~ tr/-_//d;

my $chr_diff = ($lhs_chr cmp $rhs_chr);
return $chr_diff if ($chr_diff);
}
return 0;
}

##################################################################
# Compare 2 version strings - part by part.
# This uses _icmp to compare sub-parts, so "allows" for textual parts.
# If the optional "sloppy" arg is set then extra parts are ignored, so
# that, e.g., 20.2 is equal to 20.2.1.0.1
#
sub _vcomp($$;$) {
my @va = split(/\./, shift);
my @vb = split(/\./, shift);
my $sloppy = shift || 0; # If sloppy, ignore extra sub-versions

# We need to know the shorter one and only compare to that length.
#
my $both_len = (@va < @vb)? @va: @vb;
for (my $i = 0; $i < $both_len; $i++) {
my $diff = _icmp($va[$i], $vb[$i]);
return $diff if ($diff);
}
# If we get here with equality then we check for any remaining parts
#
return ($sloppy? 0: (@va <=> @vb));
}

read from a file	2	Jul 31, 2009
retriving escape unicode sequences from files ...	1	Aug 4, 2012
retriving escape unicode sequences from files ...	1	Aug 4, 2012
print header for output	0	Jun 19, 2011
Select a node by name in file with a parameter from another file	1	Mar 8, 2005
How to read a binary file into a mysql table	4	Dec 14, 2007
Ruby Weekly News 2nd - 15th May 2005	14	May 16, 2005
Fast way to process large files line by line	18	Nov 15, 2006

Read string from multiple files, output ordered by a 2nd file

Scott Bass

Gunnar Hjalmarsson

Tad McClellan

Fabian Pilkowski

Big and Blue

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads