newbie question

S

scarlet

Hello,
I have two files : file A.tok and file B.lst
File A contains a hash table of words an ther frequency
File B contains a list of words
I have to generate a file C that contains the list of words form file A AND
if a word form file A matches a word from the list in file B, there has to
come "VZ" next to those specific words in file C.
How can I do this ???

thank you
 
J

Jürgen Exner

[Subject: newbie question]

The first half of your subject is irrelevant and actually may cause some
people to score down your posting.
The second half is redundant because most initial postings involve a
question.
I have two files : file A.tok and file B.lst
File A contains a hash table of words an ther frequency
File B contains a list of words
I have to generate a file C that contains the list of words form file A AND
if a word form file A matches a word from the list in file B, there has to
come "VZ" next to those specific words in file C.
How can I do this ???

What have you tried so far? Where are you stuck? Do you have a problem
with designing the algorithm? Or do you have a problem with a specific
function or feature? Or isn't your code doing what it is supposed to do?

Actually, your question smells a little bit like homework....

jue
 
S

scarlet

This is what I have already:

$file="VZ.lst";
open(FILE,"$file");
while ($lijn=<FILE>){


@words=split(/\n\,$lijn);
foreach $element(@words){


$in="krantenartikel.tok";
open(IN,"$in");
while ($lijn1=<IN>){
chomp $lijn1;
($token,$freq)=split(/\t/,$lijn1);
}


if ($element=$token){
$freq="VZ";
}
else {
$freq="";}
}
}
$out='#krantenartikel.vz#';
open(OUT,">$out");
print OUT "$token\t$freq\n";

First, I open the .lst file and define the array it contains. Then, I open
the other file and make a table of the words and their frequency. I want to
make a new file, "krantenartikel.vz", that contains the elements I mentioned
earlier.

I know the command "if($element=$token) is wrong, but my problem is that I
don't know how to do it otherwise, so it could work.
Jürgen Exner said:
[Subject: newbie question]

The first half of your subject is irrelevant and actually may cause some
people to score down your posting.
The second half is redundant because most initial postings involve a
question.
I have two files : file A.tok and file B.lst
File A contains a hash table of words an ther frequency
File B contains a list of words
I have to generate a file C that contains the list of words form file A
AND
if a word form file A matches a word from the list in file B, there has to
come "VZ" next to those specific words in file C.
How can I do this ???

What have you tried so far? Where are you stuck? Do you have a problem
with designing the algorithm? Or do you have a problem with a specific
function or feature? Or isn't your code doing what it is supposed to do?

Actually, your question smells a little bit like homework....

jue
 
J

Jürgen Exner

[Do not stealth-CC me, I happen to read the NGs I am posting in]
[Do not top-post, that is poor style; trying to repair]

scarlet said:
Jürgen Exner said:
[Subject: newbie question]

The first half of your subject is irrelevant and actually may cause some
people to score down your posting.
The second half is redundant because most initial postings involve a
question.
I have two files : file A.tok and file B.lst
File A contains a hash table of words an ther frequency
File B contains a list of words
I have to generate a file C that contains the list of words form file A
AND
if a word form file A matches a word from the list in file B, there has to
come "VZ" next to those specific words in file C.
How can I do this ???

What have you tried so far? Where are you stuck? Do you have a problem
with designing the algorithm? Or do you have a problem with a specific
function or feature? Or isn't your code doing what it is supposed to do?

Actually, your question smells a little bit like homework....

jue
This is what I have already:

Missing
use strict; use warnings;
$file="VZ.lst";
open(FILE,"$file");

You should always test if an open() was successful:
open(FILE,"$file") or die("Could not open $file because $!\n");
while ($lijn=<FILE>){
@words=split(/\n\,$lijn);

This line causes a syntax error. I think you meant
@words=split(/\n/,$lijn);
instead.

But I don't think it does what you meant it do to.
You are reading the file line by line. That means there is exactly one
newline at the very end of each string. Not much sense in splitting the
line at the very end. I think all you want here is a plain chomp() on
the line itself. Or if each line can contain multiple words then a
split() on white space or whatever separates those wordsm but not on
newline.
foreach $element(@words){
$in="krantenartikel.tok";

Proper indentation makes the scope of a loop and in particular nested
loops much, much easier to recognize.
open(IN,"$in");

You should always test if an open() was successful:
open(IN,"$in") or die("Could not open $in because $!\n");
while ($lijn1=<IN>){
chomp $lijn1;
Good.


Nice.

}
if ($element=$token){

As you noted yourself this is an assignment and certainly not what you
want. Even ($element==$token) would be wrong because it would compare
the numerical values of those two strings.
To compare the textual value of two scalars use
($element eq $token)
$freq="VZ";
} else {
$freq="";}
}
}
$out='#krantenartikel.vz#';
open(OUT,">$out");

You should always test if an open() was successful:
open(FILE,"$out") or die("Could not open $out because $!\n");
print OUT "$token\t$freq\n";

First, I open the .lst file and define the array it contains. Then, I open
the other file and make a table of the words and their frequency. I want to
make a new file, "krantenartikel.vz", that contains the elements I mentioned
earlier.

There are a few more conceptual and algorithmic problems with your code.

The most obvious issue is that you are printing only one single item to
your output file. This is because the outermost while() ends before the
print(), so the print will only be called exactly once at the very end
of the program. Had you used proper indentation then this would have
been obviuos (I actually ran your code through indent-region in emacs).

Same problem with the if(). It is executed AFTER the innermost while()
loop has already terminated, thus you are testing only against the very
last line of the krantenartikel.tok file.

Both issues can be fixed with little effort, but your code is also very
inefficient: for each line in VZ.lst you are looping through the while
krantenartikel.tok file. That is very costly, with O(n*m) it's a square
algorithm. It would be easy enough to do much better than that by just
reading all of krantenartikel.tok into memory once and then loop over
the in-memory copy.

However Perl has s data structure that makes looking for "does X exist"
really trivial and very very fast: a hash.

So, the revised plan is:
- create a hash where the tokens from krantenartikel.tok are the keys
- open the output file
- open VZ.lst and for each word in that file
check if it exists in the hash
and print the proper output line
- close and cleanup everything

All together I am getting this code which compiles but which I couldn't
test further because I don't have any test data:

use strict; use warnings;

my %tokens;

my $in="krantenartikel.tok";
open(IN,"$in") or die("Cannot open $in: $!\n");
while (my $lijn1=<IN>){
chomp $lijn1;
my ($tok,$freq)=split(/\t/,$lijn1);
$tokens{$tok} = $freq;
#we don't really need to store the frequency, but because we
#need some dummy value anyway we can just as well use that one
}
close(IN);

my $out='#krantenartikel.vz#';
open(OUT,">$out") or die("Cannot open $out: $!\n");
my $file="VZ.lst";
open(FILE,"$file") or die("Cannot open $file: $!\n");

while (my $lijn=<FILE>){
#I am assuming VZ.lst contains one word per line
chomp $lijn;
if (exists($tokens{$lijn})){
print OUT "$lijn\tVZ\n";
} else {
print OUT "$lijn\n";
}
}

close FILE;
close OUT or die("Problem closing $out: $!\n");



jue
 
S

sln

Hello,
I have two files : file A.tok and file B.lst
File A contains a hash table of words an ther frequency
File B contains a list of words
I have to generate a file C that contains the list of words form file A AND
if a word form file A matches a word from the list in file B, there has to
come "VZ" next to those specific words in file C.
How can I do this ???

thank you

-sln
-----
the out:
d
e
f
cVZ
d
aVZ
z
bVZ aVZ
-----

use strict;
use warnings;

my $tokstring = "a afreq \n b bfreq \n c cfreq ";
my $bstring = "d \ne \nf \nc \nd \na \nz \nb a\n ";

open my $tfile, '<', \$tokstring or die "can't open tok file: $!";
my %toks = map {/\s*([^\s]+)\s+([^\s]*)/, defined $1 ? ($1,$2) : ()} <$tfile>;
close $tfile;

open my $bfile, '<', \$bstring or die "can't open bstr file: $!";
while (<$bfile>)
{
s/([^\s]+)(?=\s+)/exists $toks{$1} ? $1.'VZ': $1/ge;
print;
}
close $bfile;
 
C

ccc31807

Hello,
I have two files : file A.tok and file B.lst

It would be helpful if you posted a sample of each file, so we would
know exactly what the files look like.

CC.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top