newbie question

Discussion in 'Perl Misc' started by scarlet, Dec 13, 2009.

  1. scarlet

    scarlet Guest

    Hello,
    I have two files : file A.tok and file B.lst
    File A contains a hash table of words an ther frequency
    File B contains a list of words
    I have to generate a file C that contains the list of words form file A AND
    if a word form file A matches a word from the list in file B, there has to
    come "VZ" next to those specific words in file C.
    How can I do this ???

    thank you
     
    scarlet, Dec 13, 2009
    #1
    1. Advertising

  2. scarlet

    scarlet Guest

    Tad,
    that is just the problem :(
    i don't know how to write program.

    greetings,


    "Tad McClellan" <> schreef in bericht
    news:...
    > scarlet <> wrote:
    >
    >
    >> Subject: newbie question

    >
    > Please put the subject of your article in the Subject of your article.
    >
    >
    >> I have two files : file A.tok and file B.lst
    >> File A contains a hash table of words an ther frequency
    >> File B contains a list of words
    >> I have to generate a file C that contains the list of words form file A
    >> AND
    >> if a word form file A matches a word from the list in file B, there has
    >> to
    >> come "VZ" next to those specific words in file C.
    >> How can I do this ???

    >
    >
    > I think you may need to write a program to do this.
    >
    > If you get stuck, then post what you've written so far,
    > and we will help you fix it.
    >
    >
    > --
    > Tad McClellan
    > email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
     
    scarlet, Dec 13, 2009
    #2
    1. Advertising

  3. "scarlet" <> wrote:
    [Subject: newbie question]

    The first half of your subject is irrelevant and actually may cause some
    people to score down your posting.
    The second half is redundant because most initial postings involve a
    question.

    >I have two files : file A.tok and file B.lst
    >File A contains a hash table of words an ther frequency
    >File B contains a list of words
    >I have to generate a file C that contains the list of words form file A AND
    >if a word form file A matches a word from the list in file B, there has to
    >come "VZ" next to those specific words in file C.
    >How can I do this ???


    What have you tried so far? Where are you stuck? Do you have a problem
    with designing the algorithm? Or do you have a problem with a specific
    function or feature? Or isn't your code doing what it is supposed to do?

    Actually, your question smells a little bit like homework....

    jue
     
    Jürgen Exner, Dec 13, 2009
    #3
  4. scarlet

    scarlet Guest

    This is what I have already:

    $file="VZ.lst";
    open(FILE,"$file");
    while ($lijn=<FILE>){


    @words=split(/\n\,$lijn);
    foreach $element(@words){


    $in="krantenartikel.tok";
    open(IN,"$in");
    while ($lijn1=<IN>){
    chomp $lijn1;
    ($token,$freq)=split(/\t/,$lijn1);
    }


    if ($element=$token){
    $freq="VZ";
    }
    else {
    $freq="";}
    }
    }
    $out='#krantenartikel.vz#';
    open(OUT,">$out");
    print OUT "$token\t$freq\n";

    First, I open the .lst file and define the array it contains. Then, I open
    the other file and make a table of the words and their frequency. I want to
    make a new file, "krantenartikel.vz", that contains the elements I mentioned
    earlier.

    I know the command "if($element=$token) is wrong, but my problem is that I
    don't know how to do it otherwise, so it could work.
    "Jürgen Exner" <> schreef in bericht
    news:p...
    > "scarlet" <> wrote:
    > [Subject: newbie question]
    >
    > The first half of your subject is irrelevant and actually may cause some
    > people to score down your posting.
    > The second half is redundant because most initial postings involve a
    > question.
    >
    >>I have two files : file A.tok and file B.lst
    >>File A contains a hash table of words an ther frequency
    >>File B contains a list of words
    >>I have to generate a file C that contains the list of words form file A
    >>AND
    >>if a word form file A matches a word from the list in file B, there has to
    >>come "VZ" next to those specific words in file C.
    >>How can I do this ???

    >
    > What have you tried so far? Where are you stuck? Do you have a problem
    > with designing the algorithm? Or do you have a problem with a specific
    > function or feature? Or isn't your code doing what it is supposed to do?
    >
    > Actually, your question smells a little bit like homework....
    >
    > jue
     
    scarlet, Dec 13, 2009
    #4
  5. [Do not stealth-CC me, I happen to read the NGs I am posting in]
    [Do not top-post, that is poor style; trying to repair]

    "scarlet" <> wrote:
    >"Jürgen Exner" <> schreef in bericht
    >news:p...
    >> "scarlet" <> wrote:
    >> [Subject: newbie question]
    >>
    >> The first half of your subject is irrelevant and actually may cause some
    >> people to score down your posting.
    >> The second half is redundant because most initial postings involve a
    >> question.
    >>
    >>>I have two files : file A.tok and file B.lst
    >>>File A contains a hash table of words an ther frequency
    >>>File B contains a list of words
    >>>I have to generate a file C that contains the list of words form file A
    >>>AND
    >>>if a word form file A matches a word from the list in file B, there has to
    >>>come "VZ" next to those specific words in file C.
    >>>How can I do this ???

    >>
    >> What have you tried so far? Where are you stuck? Do you have a problem
    >> with designing the algorithm? Or do you have a problem with a specific
    >> function or feature? Or isn't your code doing what it is supposed to do?
    >>
    >> Actually, your question smells a little bit like homework....
    >>
    >> jue


    >This is what I have already:
    >


    Missing
    use strict; use warnings;

    >$file="VZ.lst";
    >open(FILE,"$file");


    You should always test if an open() was successful:
    open(FILE,"$file") or die("Could not open $file because $!\n");

    >while ($lijn=<FILE>){
    >@words=split(/\n\,$lijn);


    This line causes a syntax error. I think you meant
    @words=split(/\n/,$lijn);
    instead.

    But I don't think it does what you meant it do to.
    You are reading the file line by line. That means there is exactly one
    newline at the very end of each string. Not much sense in splitting the
    line at the very end. I think all you want here is a plain chomp() on
    the line itself. Or if each line can contain multiple words then a
    split() on white space or whatever separates those wordsm but not on
    newline.

    >foreach $element(@words){
    >$in="krantenartikel.tok";


    Proper indentation makes the scope of a loop and in particular nested
    loops much, much easier to recognize.

    >open(IN,"$in");


    You should always test if an open() was successful:
    open(IN,"$in") or die("Could not open $in because $!\n");

    >while ($lijn1=<IN>){
    >chomp $lijn1;


    Good.

    >($token,$freq)=split(/\t/,$lijn1);


    Nice.

    >}
    >if ($element=$token){


    As you noted yourself this is an assignment and certainly not what you
    want. Even ($element==$token) would be wrong because it would compare
    the numerical values of those two strings.
    To compare the textual value of two scalars use
    ($element eq $token)

    >$freq="VZ";
    >} else {
    >$freq="";}
    >}
    >}
    >$out='#krantenartikel.vz#';
    >open(OUT,">$out");


    You should always test if an open() was successful:
    open(FILE,"$out") or die("Could not open $out because $!\n");

    >print OUT "$token\t$freq\n";
    >
    >First, I open the .lst file and define the array it contains. Then, I open
    >the other file and make a table of the words and their frequency. I want to
    >make a new file, "krantenartikel.vz", that contains the elements I mentioned
    >earlier.


    There are a few more conceptual and algorithmic problems with your code.

    The most obvious issue is that you are printing only one single item to
    your output file. This is because the outermost while() ends before the
    print(), so the print will only be called exactly once at the very end
    of the program. Had you used proper indentation then this would have
    been obviuos (I actually ran your code through indent-region in emacs).

    Same problem with the if(). It is executed AFTER the innermost while()
    loop has already terminated, thus you are testing only against the very
    last line of the krantenartikel.tok file.

    Both issues can be fixed with little effort, but your code is also very
    inefficient: for each line in VZ.lst you are looping through the while
    krantenartikel.tok file. That is very costly, with O(n*m) it's a square
    algorithm. It would be easy enough to do much better than that by just
    reading all of krantenartikel.tok into memory once and then loop over
    the in-memory copy.

    However Perl has s data structure that makes looking for "does X exist"
    really trivial and very very fast: a hash.

    So, the revised plan is:
    - create a hash where the tokens from krantenartikel.tok are the keys
    - open the output file
    - open VZ.lst and for each word in that file
    check if it exists in the hash
    and print the proper output line
    - close and cleanup everything

    All together I am getting this code which compiles but which I couldn't
    test further because I don't have any test data:

    use strict; use warnings;

    my %tokens;

    my $in="krantenartikel.tok";
    open(IN,"$in") or die("Cannot open $in: $!\n");
    while (my $lijn1=<IN>){
    chomp $lijn1;
    my ($tok,$freq)=split(/\t/,$lijn1);
    $tokens{$tok} = $freq;
    #we don't really need to store the frequency, but because we
    #need some dummy value anyway we can just as well use that one
    }
    close(IN);

    my $out='#krantenartikel.vz#';
    open(OUT,">$out") or die("Cannot open $out: $!\n");
    my $file="VZ.lst";
    open(FILE,"$file") or die("Cannot open $file: $!\n");

    while (my $lijn=<FILE>){
    #I am assuming VZ.lst contains one word per line
    chomp $lijn;
    if (exists($tokens{$lijn})){
    print OUT "$lijn\tVZ\n";
    } else {
    print OUT "$lijn\n";
    }
    }

    close FILE;
    close OUT or die("Problem closing $out: $!\n");



    jue
     
    Jürgen Exner, Dec 13, 2009
    #5
  6. scarlet

    Guest

    On Sun, 13 Dec 2009 13:36:51 +0100, "scarlet" <> wrote:

    >Hello,
    >I have two files : file A.tok and file B.lst
    >File A contains a hash table of words an ther frequency
    >File B contains a list of words
    >I have to generate a file C that contains the list of words form file A AND
    >if a word form file A matches a word from the list in file B, there has to
    >come "VZ" next to those specific words in file C.
    >How can I do this ???
    >
    >thank you


    -sln
    -----
    the out:
    d
    e
    f
    cVZ
    d
    aVZ
    z
    bVZ aVZ
    -----

    use strict;
    use warnings;

    my $tokstring = "a afreq \n b bfreq \n c cfreq ";
    my $bstring = "d \ne \nf \nc \nd \na \nz \nb a\n ";

    open my $tfile, '<', \$tokstring or die "can't open tok file: $!";
    my %toks = map {/\s*([^\s]+)\s+([^\s]*)/, defined $1 ? ($1,$2) : ()} <$tfile>;
    close $tfile;

    open my $bfile, '<', \$bstring or die "can't open bstr file: $!";
    while (<$bfile>)
    {
    s/([^\s]+)(?=\s+)/exists $toks{$1} ? $1.'VZ': $1/ge;
    print;
    }
    close $bfile;
     
    , Dec 13, 2009
    #6
  7. scarlet

    ccc31807 Guest

    On Dec 13, 7:36 am, "scarlet" <> wrote:
    > Hello,
    > I have two files : file A.tok and file B.lst


    It would be helpful if you posted a sample of each file, so we would
    know exactly what the files look like.

    CC.
     
    ccc31807, Dec 13, 2009
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jerry C.
    Replies:
    8
    Views:
    248
    Uri Guttman
    Nov 23, 2003
  2. Kruno Saho
    Replies:
    0
    Views:
    145
    Kruno Saho
    Apr 7, 2013
  3. Dave Angel
    Replies:
    0
    Views:
    126
    Dave Angel
    Apr 7, 2013
  4. rusi
    Replies:
    0
    Views:
    114
  5. Miki Tebeka
    Replies:
    0
    Views:
    85
    Miki Tebeka
    Apr 7, 2013
Loading...

Share This Page