finding common words

Tore Aursand · Feb 25, 2004

Thanks to all.

No problem, but please consider stop top-posting. It's a bad thing.

Can I ask another newbie question please? The solutions provided are
working fine but sometimes depending on the lengths of words, I don't
get aligned results, e.g.

apple 2.2
boy 9.0
definite 2.5
eel 5.0

Is there a way of aligning these comments?

Yes, and it's very simple. I won't tell you how to, however, 'cause I
think it's best if you'd try it yourself first and - eventually - come
back here with some code you're having problem(s) with.

Please don't call me rude; I think it's a nice excercise for you to try
to solve this on your own.

If don't want to roll this one on your own, please consider using the
Text::Table module (it's excellent);

Tad McClellan · Feb 25, 2004

viv2k said:
apple 2.2
boy 9.0
definite 2.5
eel 5.0

Is there a way of aligning these comments?

Those are not comments. They appear to be output.

perldoc -f printf
perldoc -f sprintf

Hunter Johnson · Feb 25, 2004

Uri Guttman said:
HJ> If the docs say there is no list generated, how is map going to tell
HJ> the reader that the docs are wrong?

HJ> If a reader reads something extra into what's written (like "here
HJ> comes a list" when none is coming), how is that more the writer's
HJ> problem than the reader's? Either the writer can write it
HJ> differently (using 'for' in this case) or the reader can read it
HJ> differently. I don't see how the former is the only right answer.

it is a semantic communication to the reader of the code. it is your
(the coder's) responsibility to convey as much accurate information to
the reader as possible. map in a void context is not as accurate as a
for modifier even with the optimization.

Yes, it is because, as you say:

sure the docs will say it won't generate a list

Of course, you then qualify that with:

but its history has always been that way.

Historically, map wasn't even in perl, but people use it with its
current implementation anyway. That's all I'm suggesting will happen
now -- no list in void context means that programmers can (and should,
as they like) use it in void context, and readers will have to learn
that the language has changed, which it has.

Hunter

Gunnar Hjalmarsson · Feb 25, 2004

[ Please learn how to reply properly. You should respond *below* the
quoted text. ]

sorry for those 'unsophisticated' questions Gunnar. But it's first
time I'm playing with Perl and geez..

Note that it was my solution I called unsophisticated, seeing all the
other solutions that came up in this thread.

I'll definitely learn Perl in more detail in future

Why not now, so that you can accomplish this task the way you want it
to be?

The solutions provided are working fine but sometimes depending on
the lengths of words, I don't get aligned results, e.g.

apple 2.2
boy 9.0
definite 2.5
eel 5.0

Is there a way of aligning these comments?

Yes, and I agree with Tore that it's high time that you start learning
how to figure it out by help of the documentation and the available
modules. Tore mentioned a module, and Tad mentioned a couple of
functions.

If you encounter problems, please feel welcome to ask for help here,
but now we want to see code that you wrote.

Good luck!

Tore Aursand · Feb 25, 2004

Those are not comments. They appear to be output.

perldoc -f printf
perldoc -f sprintf

Hmm. Isn't it also appropriate to mention 'perldoc -f length'? I can't
think of a way to solve this with just (s)printf...?

Jean-Pierre Vidal · Feb 25, 2004

Thanks to all. I know many of u here are experts in Perl.. sorry for
those 'unsophisticated' questions Gunnar. But it's first time I'm
playing with Perl and geez.. it seems to be very powerful in text
manipulation...I'll definitely learn Perl in more detail in future

Can I ask another newbie question please? The solutions provided are
working fine but sometimes depending on the lengths of words, I don't
get aligned results, e.g.

apple 2.2
boy 9.0
definite 2.5
eel 5.0

Is there a way of aligning these comments?

perldoc perlform ?

Jean-Pierre

viv2k · Mar 1, 2004

Gunnar Hjalmarsson said:
If you encounter problems, please feel welcome to ask for help here,
but now we want to see code that you wrote.

Ok guys, I've got a slightly different requirement now. The lists I
have to compare have increased to 3 and I need to find out words that
appear in at least two of the lists. However, if a word appears in
only 2 list, I need to include it in my third list by giving it a
default value, like '0'.

Example:
ListA
Apple 4, Boy 3, Cat 5
ListB
Apple 1.0, Baby 2.1, Cat 3.3
ListC
Apple 99, Beef 100, Cow 101

Should give me as result:
ListA ListB ListC
Apple 4 Apple 1.0 Apple 99
Cat 5 Cat 3.3 Cat 0

Right now, my code looks like below but I don't know how to give this
default value. Any help is much appreciated.

my %ListA;
open my $fh, '< ListA.txt' or die "Couldn't open ListA.txt $!";
while (<$fh>) {
for (split /,\s*/) {
my ($key, $value) = split;
$ListA{$key} = $value;
}
}
close $fh;

my %ListB;
open my $fh, '< ListB.txt' or die "Couldn't open ListB.txt $!";
while (<$fh>) {
for (split /,\s*/) {
my ($key, $value) = split;
$ListB{$key} = $value;
}
}
close $fh;

my %ListC;
open my $fh, '< ListC.txt' or die "Couldn't open ListC.txt $!";
while (<$fh>) {
for (split /,\s*/) {
my ($key, $value) = split;
$ListC{$key} = $value;
}
}
close $fh;

for (keys %ListA) {
delete $ListA{$_} unless exists $ListB{$_} || $ListC{$_} ;
}
for (keys %ListB) {
delete $ListB{$_} unless exists $ListA{$_} || $ListC{$_};
}
for (keys %ListC) {
delete $ListC{$_} unless exists $ListA{$_} || $ListB{$_};
}

print "ListA \n";
print "$_\t$ListA{$_}\n" for sort keys %ListA;
print "\n";
print "ListB \n";
print "$_\t$ListB{$_}\n" for sort keys %ListB;
print "\n";
print "ListC \n";
print "$_\t$ListC{$_}\n" for sort keys %ListC;

Gunnar Hjalmarsson · Mar 1, 2004

viv2k said:
Ok guys, I've got a slightly different requirement now. The lists I
have to compare have increased to 3 and I need to find out words
that appear in at least two of the lists. However, if a word
appears in only 2 list, I need to include it in my third list by
giving it a default value, like '0'.

Example:
ListA
Apple 4, Boy 3, Cat 5
ListB
Apple 1.0, Baby 2.1, Cat 3.3
ListC
Apple 99, Beef 100, Cow 101

Should give me as result:
ListA ListB ListC
Apple 4 Apple 1.0 Apple 99
Cat 5 Cat 3.3 Cat 0

Right now, my code looks like below but I don't know how to give
this default value. Any help is much appreciated.

Well, you need to somehow count the number of occurrences of each
word, right? Below find a suggestion that does just that.

my %ListA;
open my $fh, '< ListA.txt' or die "Couldn't open ListA.txt $!";
while (<$fh>) {
for (split /,\s*/) {
my ($key, $value) = split;
$ListA{$key} = $value;
}
}
close $fh;

my %ListB;
open my $fh, '< ListB.txt' or die "Couldn't open ListB.txt $!";
while (<$fh>) {
for (split /,\s*/) {
my ($key, $value) = split;
$ListB{$key} = $value;
}
}
close $fh;

my %ListC;
open my $fh, '< ListC.txt' or die "Couldn't open ListC.txt $!";
while (<$fh>) {
for (split /,\s*/) {
my ($key, $value) = split;
$ListC{$key} = $value;
}
}
close $fh;

for (keys %ListA) {
delete $ListA{$_} unless exists $ListB{$_} || $ListC{$_} ;
}
for (keys %ListB) {
delete $ListB{$_} unless exists $ListA{$_} || $ListC{$_};
}
for (keys %ListC) {
delete $ListC{$_} unless exists $ListA{$_} || $ListB{$_};
}

my %count;
$count{$_}++ for keys %ListA, keys %ListB, keys %ListC;

for (keys %count) {
if ($count{$_} > 1) {
$ListA{$_} ||= 0;
$ListB{$_} ||= 0;
$ListC{$_} ||= 0;
}
}

print "ListA \n";
print "$_\t$ListA{$_}\n" for sort keys %ListA;
print "\n";
print "ListB \n";
print "$_\t$ListB{$_}\n" for sort keys %ListB;
print "\n";
print "ListC \n";
print "$_\t$ListC{$_}\n" for sort keys %ListC;

Even if this solution gets the job done, there are likely more
efficient ways to do it.

Tore Aursand · Mar 1, 2004

Ok guys, I've got a slightly different requirement now. The lists I
have to compare have increased to 3 and I need to find out words that
appear in at least two of the lists. However, if a word appears in
only 2 list, I need to include it in my third list by giving it a
default value, like '0'.

The easiest solution is to keep track of how many lists each data entry is
represented in, ie. by doing something like while iterating through each
list:

open my $fh, '< ListA.txt' or die "Couldn't open ListA.txt $!";
while (<$fh>) {
for (split /,\s*/) {
my ($key, $value) = split;
$ListA{$key} = $value;
$all{$key}++;

}
}
close $fh;

This way, you'll always know how many lists each '$key' is in (by looking
at its value).

Anno Siegel · Mar 1, 2004

Gunnar Hjalmarsson said:
viv2k wrote:
[...]

Example:
ListA
Apple 4, Boy 3, Cat 5
ListB
Apple 1.0, Baby 2.1, Cat 3.3
ListC
Apple 99, Beef 100, Cow 101

Should give me as result:
ListA ListB ListC
Apple 4 Apple 1.0 Apple 99
Cat 5 Cat 3.3 Cat 0

Right now, my code looks like below but I don't know how to give
this default value. Any help is much appreciated.

Click to expand...

Well, you need to somehow count the number of occurrences of each
word, right? Below find a suggestion that does just that.

[setup of %ListA, %ListB and %listC]

my %count;
$count{$_}++ for keys %ListA, keys %ListB, keys %ListC;

for (keys %count) {
if ($count{$_} > 1) {
$ListA{$_} ||= 0;
$ListB{$_} ||= 0;
$ListC{$_} ||= 0;
}
}

Even if this solution gets the job done, there are likely more
efficient ways to do it.

I don't know about efficiency, it doesn't seem to be an issue here.

I'd make the individual %ListA .. %ListC an array of hashes, and write
the code so that it works for an arbitrary number of them. At the same
time, the code gets more compact (if that is an advantage). Also,
avoid tabs like the plague. The meaning of tabs is too ill-defined for
them to be of practical use. Use (s)printf for formating, or use
Text::Table. Thirdly, Perl's default value for numbers *is* 0. The
idiom " ... || 0" (soon " ... // 0", yay) has its place, but isn't
needed here.

Assume "@lists = \ ( %ListA, %ListB, %ListC)" to connect to previous
code. One would read the data directly into @lists.

my %count;
$count{ $_} ++ for map keys %$_, @lists;

for my $item ( sort grep $count{ $_} >= 2, keys %count ) {
# interleave copies of $item with prices
my @vals = map { $item, $_->{ $item} } @lists;
no warnings 'uninitialized'; # take care of default
printf '%6s %5.2f ' x @lists . "\n", @vals;
}

Anno

Gunnar Hjalmarsson · Mar 1, 2004

Anno said:
I'd make the individual %ListA .. %ListC an array of hashes, and
write the code so that it works for an arbitrary number of them.

I thought of that, but skipped it since OP isn't likely ready for
complex data structures.

Also, avoid tabs like the plague. The meaning of tabs is too
ill-defined for them to be of practical use. Use (s)printf for
formating, or use Text::Table.

That aspect was discussed previously in the thread.

Thirdly, Perl's default value for numbers *is* 0. The idiom " ...
|| 0" (soon " ... // 0", yay) has its place, but isn't needed here.

.... provided (s)printf() and disabling the uninitialized warning ...

A script to flag commonly misused words	2	Jul 31, 2007
FW: Fml status report (ruby-talk ML)	1	Dec 19, 2010
Ruby Weekly News 7th - 13th August 2006	0	Aug 17, 2006
FW: Fml status report (ruby-talk ML)	0	Apr 30, 2007
How bad is $'? (Was: "Get substring of line")	4	Jan 18, 2005
Pithy programming Quotations	23	Aug 8, 2009
Seek Contract Programming Work - 17 Years Experience	0	Feb 22, 2005
is list comprehension necessary?	15	Oct 26, 2010

finding common words

Tore Aursand

Tad McClellan

Hunter Johnson

Gunnar Hjalmarsson

Tore Aursand

Jean-Pierre Vidal

viv2k

Gunnar Hjalmarsson

Tore Aursand

Anno Siegel

Gunnar Hjalmarsson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads