pattern matching

L

Lex

Hi,

I've got a text file, database looking like:

ID|word|plural|synonym|meaning

now 268 records. (specialised dictionary)

When somebody looks up a word, it checks for a match in word or plural or
synonym.
(sofar I can even do it...)

However, what I'd want as well that as soon as 'meaning' is shown to the
user, (normally a few phrases) every word in it should be checked to see if
it's in any of the word, plural or synonym fields of all the (other)
records. (so I can link it to that record)

What's the best way of doing that? Any examples?

Thanks,

Lex
 
M

Matija Papec

X-Ftn-To: Lex

Lex said:
However, what I'd want as well that as soon as 'meaning' is shown to the
user, (normally a few phrases) every word in it should be checked to see if
it's in any of the word, plural or synonym fields of all the (other)
records. (so I can link it to that record)

What's the best way of doing that? Any examples?

my %winfo;
for my $word (split /\s+/, $meaning) {
@winfo{qw/id plural synonym meaning/} = sub_id($word);
}

sub_id returns to you values for given $word, and $meaning is already
retrieved meaning for initial word.
 
T

Tad McClellan

Lex said:
Hi,

I've got a text file, database looking like:

ID|word|plural|synonym|meaning

now 268 records. (specialised dictionary)

When somebody looks up a word, it checks for a match in word or plural or
synonym.
(sofar I can even do it...)

However, what I'd want as well that as soon as 'meaning' is shown to the
user, (normally a few phrases) every word in it should be checked to see if
it's in any of the word, plural or synonym fields of all the (other)
records. (so I can link it to that record)

What's the best way of doing that? Any examples?


Any example data?

Untested, since I don't want to have to create test data:

my %xref;
while ( <DB> ) {
my($id, $word, $plural, $synonym) = split /\|/;
@xref{ $word, $plural, $synonym } = ($id, $id, $id);
}
...
foreach my $word ( $meaning =~ /(\S+)/g ) {
print "make a link to ID $xref{$word}\n" if exists $xref{$word}
}
 
L

Lex

my %winfo;
for my $word (split /\s+/, $meaning) {
@winfo{qw/id plural synonym meaning/} = sub_id($word);
}

sub_id returns to you values for given $word, and $meaning is already
retrieved meaning for initial word.

Hi Matija,

thanks for the reply. I'm a newbie and trying to see what exaclty happens
with what you wrote.

for my $word (split /\s+/, $meaning) { # every single word in $meaning
@winfo{qw/id plural synonym meaning/} = sub_id($word);

Shouldn't that be @winfo{qw/word plural synonym/} = sub_id($word);
Might well be that I don't understand what's going on here...

I think I didn't write down well what I'd like to happen or that I just
don't understand what you wrote.

I'll retry explaining what it is I'd like to happen:

flatfile database, somebody asks the meaning of a word.
The meaning is shown
But before that, every word in meaning is checked if it isn't in any of the
word, synonym or plural fields of all other records. If it is, I can make a
link of that particular word in meaning so people can look up the meaning of
that word as well.

To give an example:

db:
1|stove|stoves|heater|a thing to warm up a room
2|room|rooms|chamber|part of a building, often equiped wit a stove

Now, if somebody looks up the word stove the result would be:
stove: a thing to warm up a room
in which the word 'room' would link to a subroutine to show the meaning of
'room' (and when people would click on it, they'd see 'room: part of a
building, often equiped wit a stove' where stove would link to ... you get
my drift.

I think that might well be my problem, you do get my drift but I don't get
yours. Excuse me if that is the case. If not I hope I was able to describe
my 'problem' better.

Thanks,

Lex
 
M

Matija Papec

X-Ftn-To: Lex

Lex said:
Hi Matija,

thanks for the reply. I'm a newbie and trying to see what exaclty happens
with what you wrote.

for my $word (split /\s+/, $meaning) { # every single word in $meaning
@winfo{qw/id plural synonym meaning/} = sub_id($word);

Shouldn't that be @winfo{qw/word plural synonym/} = sub_id($word);
Might well be that I don't understand what's going on here...

I don't know your actual code which you used so far so I made up some
function name which retrieves attributes for requested word; eg. if you want
to know all about "stove", sub_id("stove") returns a list of values,

('1', 'stoves', 'heater', 'a thing to warm up a room')

which are then stored in %winfo hash, so $winfo{plural} contains 'stoves',
and so on..

sub_id should open your text file, split lines and return attributes when it
finds wanted word (it could return only meaning if that is all you need).

sub sub_id {

my($word) = @_;
my @values;
my @fields = qw/id word plural synonym meaning/;

...
LINE: while (my $line = <DAT>) {
chomp $line;
my %attr = ();
@attr{@fields} = split /\|/, $line;

for my $field (qw/word plural synonym/) {
if ($attr{$field} eq $word) {
@values = @attr{@fields};
#word and plural have advantage over synonym
last LINE if $field ne 'synonym';
}
}
}

return @values;
}
 
L

Lex

Any example data?

Thanks I'll go and try it!

Sample data:

1|aardscheerder|aardscheerders|||Populaire, uit het Engels (Earth grazer)
afkomstige benaming voor een planetoïde (of komeet) die de aarde zeer dicht
kan naderen.||||||||||
2|aberratie||||Kleine, schijnbare verplaatsing van een hemellichaam in de
richting waarin de waarnemer beweegt.||||||||||
3|absolute helderheid||||Helderheid die een hemellichaam zou hebben wanneer
het zich op een afstand van 10 parsec zou bevinden.||||||||||
4|absorptielijn|absorptielijnen|||Smalle donkere lijn in een
absorptiespectrum, veroorzaakt doordat een specifieke golflengte van het
licht wordt geabsorbeerd door materie die zich tussen de lichtbron en de
waarnemer bevindt.||||||||||
5|abundantie|abondantie|||De relatieve hoeveelheden van de verschillende
scheikundige elementen die in sterren of gasnevels voorkomen.||||||||||

fields:
1 ID
2 word
3 plural
4 synonym
5 category
6 meaning
7 til 17 extra links info (urls & descriptions)

Lex
 
L

Lex

my %xref;
while ( <DB> ) {
my($id, $word, $plural, $synonym) = split /\|/;
@xref{ $word, $plural, $synonym } = ($id, $id, $id);
}

This bit seems to work.
foreach my $word ( $meaning =~ /(\S+)/g ) {
print "make a link to ID $xref{$word}\n" if exists $xref{$word}
}

Here nothing seems to happen. I included $test .= $word; and tried to print
it but 0 results...

What am I doing wrong?

Thanks,

Lex
 
B

Bob Walton

use strict;
use warnings;

This bit seems to work.



Here nothing seems to happen. I included $test .= $word; and tried to print
it but 0 results...

What am I doing wrong?


I would suggest using the Perl debugger to find out:

perl -d scriptname.pl

See:

perldoc perldebug

....


Try something like this (note: additional words that are actually in
your sample "dictionary" were added to two of the "meanings" so non-null
results are obtained):

use strict;
use warnings;
my %xref;
my %meaning;
while ( <DATA> ) {
chomp;
my($id, $word, $plural, $synonym,$cat,$meaning) = split /\|/;
@xref{($word, $plural, $synonym)} = ($id)x3;
$meaning{$id}=$meaning;
}
foreach my $id(keys %meaning){
foreach my $word ( $meaning{$id} =~ /(\S+)/g ) {
print "make a link from ID $id to ID $xref{$word}\n"
if exists $xref{$word}
}
}
__END__
1|aardscheerder|aardscheerders|||aberratie Populaire, uit het Engels
(Earth grazer) afkomstige benaming voor een planetoïde (of komeet) die
de aarde zeer dicht kan naderen.||||||||||
2|aberratie||||Kleine, schijnbare verplaatsing van een hemellichaam in
de richting waarin de waarnemer beweegt.||||||||||
3|absolute helderheid||||Helderheid die een hemellichaam zou hebben
wanneer het zich op een afstand van 10 parsec zou bevinden.||||||||||
4|absorptielijn|absorptielijnen|||abundantie Smalle donkere lijn in een
absorptiespectrum, veroorzaakt doordat een specifieke golflengte van het
licht wordt geabsorbeerd door materie die zich tussen de lichtbron en de
waarnemer bevindt.||||||||||
5|abundantie|abondantie|||De relatieve hoeveelheden van de verschillende
scheikundige elementen die in sterren of gasnevels voorkomen.||||||||||

Then you'll have to figure out how to "make a link".
 
L

Lex

Then you'll have to figure out how to "make a link".

Bob your code is great.

However, I still have a question.

My code now looks like this:

<code>
open (DB, "<$db_file_name_abc") or &cgierr("error in search. unable to open
database: $db_file_name_abc.\nReason: $!");
if ($db_use_flock) { flock(DB, 1); }

my %xref;
my %meaning;
while ( <DB> ) {
chomp;
my($id, $word, $plural, $synonym,$cat,$text) = split /\|/;
@xref{($word, $plural, $synonym, $text)} = ($id)x4;
$meaning{$id}=$meaning;
}

foreach my $id(keys %meaning){
foreach my $word ( $rec{'Text'} =~ /\S+/g ) {
my $newword = "<a
href=\"$db_dir_url/db.cgi?db=abc&uid=$db_uid&ID=$xref{$word}&mh=1&ww=1&view_
records=1\" class=\"abclink\" ONMOUSEOVER=\"popup('...','#C6CBDE')\";
ONMOUSEOUT=\"kill()\">$word</a>";
$rec{'Text'} =~ s|\b$word\b|$newword|gs if exists $xref{$word};
}
}

close DB;
</code>

The links works now as well while using another database (where $rec{'Text'}
is in).
The only thing I don't know how to do (and I've tried lots...) is how, where
you see " popup('...', ", the three dots, how to get there the original
$text field that goes with this link. So, where $xref{$word} is now the ID
of the record that has to be shown, I'd like the ... to be the original text
field...

Any ideas?

Thanks a lot.

Lex
 
B

Bob Walton

Lex said:
Bob your code is great.

However, I still have a question.

My code now looks like this:

<code>
open (DB, "<$db_file_name_abc") or &cgierr("error in search. unable to open
database: $db_file_name_abc.\nReason: $!");
if ($db_use_flock) { flock(DB, 1); }

my %xref;
my %meaning;
while ( <DB> ) {
chomp;
my($id, $word, $plural, $synonym,$cat,$text) = split /\|/;
@xref{($word, $plural, $synonym, $text)} = ($id)x4;

----------------------------------------^^^^^
Are you sure you want to put a whole phase in as a key in %xref? The
keys to %xref are supposed to be words according to the conventions used
so far.

$meaning{$id}=$meaning;

---------------------^^^^^^^^
Did you use strict; and use warnings; ? They would have pointed out
that this variable is undef at this point because you changed it from
$meaning to $text two lines above and didn't change it here. If it were
still stored in %meaning, then the text could be gotten back by ID just
by saying $meaning{$id}.

}

foreach my $id(keys %meaning){
foreach my $word ( $rec{'Text'} =~ /\S+/g ) {
my $newword = "<a
href=\"$db_dir_url/db.cgi?db=abc&uid=$db_uid&ID=$xref{$word}&mh=1&ww=1&view_
records=1\" class=\"abclink\" ONMOUSEOVER=\"popup('...','#C6CBDE')\";
ONMOUSEOUT=\"kill()\">$word</a>";
$rec{'Text'} =~ s|\b$word\b|$newword|gs if exists $xref{$word};
}
}

close DB;
</code>

The links works now as well while using another database (where $rec{'Text'}
is in).
The only thing I don't know how to do (and I've tried lots...) is how, where
you see " popup('...', ", the three dots, how to get there the original
$text field that goes with this link. So, where $xref{$word} is now the ID
of the record that has to be shown, I'd like the ... to be the original text
field...


If you hadn't made the mistake of changing $meaning to $text in all but
one of the places it is used, you could have used $meaning{$id} to get
that text -- just replace the ... with $meaning{$id}. You'd better make
sure the text doesn't contain any ' characters, and perhaps " characters
either. Maybe < and > also? Others? Probably best to run it through a
Javascript quoter, and then an HTML quoter?


....
 
L

Lex

----------------------------------------^^^^^
Are you sure you want to put a whole phase in as a key in %xref? The
keys to %xref are supposed to be words according to the conventions used
so far.

Well no, I just wanted to be able to use the (specific) meaning later on...

---------------------^^^^^^^^
Did you use strict; and use warnings; ? They would have pointed out
that this variable is undef at this point because you changed it from
$meaning to $text two lines above and didn't change it here. If it were
still stored in %meaning, then the text could be gotten back by ID just
by saying $meaning{$id}.

changed it back to meaning again
If you hadn't made the mistake of changing $meaning to $text in all but
one of the places it is used, you could have used $meaning{$id} to get
that text -- just replace the ... with $meaning{$id}. You'd better make
sure the text doesn't contain any ' characters, and perhaps " characters
either. Maybe < and > also? Others? Probably best to run it through a
Javascript quoter, and then an HTML quoter?

Hi Bob. First: my original question and problem is solved, it works perfect
with this code:

open (DB, "<$db_file_name") or &cgierr("error in search. unable to open
database: $db_file_name.\nReason: $!");
if ($db_use_flock) { flock(DB, 1); }

my %xref;
my %meaning;
while ( <DB> ) {
chomp;
my($id, $word, $plural, $synonym,$cat,$meaning) = split /\|/;
@xref{($word, $plural, $synonym)} = ($id)x3;
$meaning{$id}=$meaning;
}

foreach my $id(keys %meaning){
foreach my $word ( $rec{'Text'} =~ /\S+/g ) {
my $newword = "<a
href=\"$db_script_link_url&ID=$xref{$word}&mh=1&ww=1&view_records=1\">$word<
/a>";
$rec{'Text'} =~ s|\b$word\b|$newword|gs if exists $xref{$word};
}
}

close DB;

What it does: when showing the field $rec{'Text'}from one of the records of
the database it checks if words, synonyms or plurals form all records in the
database are used in this field and if so create a link to those words.
Anyway, I don't need telling you people do I? :) But just to be clear about
what I'm doning.

Now I wanted something more again (always the same...), when showing records
from another database I wanted it to happen as well. Got that working (the
links), wasn't very hard. However, I now wanted a popup screen as well
already showing the meaning of the word (taken from the meaning field). Now,
here I run into troubles and I think it's because it's still working and
trying to do things with that meaning field as well. If it would produce
just the text it works. I thought it might be ' or " as you suggested so
made sure that I got rid of them. However, that didn't do the trick and the
problem was worse than just that.

So I think $meaning{$id} in this case (underneath) has more luggage than
that what I am looking for. In this case I do not want links in this field
now, I know I want a lot and am still not capable of producing it, quite
frustrating, but hey, this is my way of learning I guess. I'll copy the code
underneath that I tried but that was giving me more than I wanted:

open (DB, "<$db_file_name_abc") or &cgierr("error in search. unable to open
database: $db_file_name_abc.\nReason: $!");
if ($db_use_flock) { flock(DB, 1); }

my %xref;
my %meaning;
while ( <DB> ) {
chomp;
my($id, $word, $plural, $synonym, $cat, $meaning) = split /\|/;
@xref{($word, $plural, $synonym, $meaning)} = ($id)x4;
$meaning{$id}=$meaning;
}

foreach my $id(keys %meaning){
foreach my $word ( $rec{'Text'} =~ /\S+/g ) {

if ($xref{$word}) {

my $newword = "<a
href=\"$db_dir_url/db.cgi?db=abc&uid=$db_uid&ID=$xref{$word}&mh=1&ww=1&view_
records=1\" class=\"abclink\"
ONMOUSEOVER=\"popup('$meaning{$id}','#ffffcc')\";
ONMOUSEOUT=\"kill()\">$word</a>";

$rec{'Text'} =~ s|\b$word\b|$newword|gs;
}
}
}

close DB;

As soon as I insert $meaning{$id} it all goes wrong, all of it, even $word
isn't what it's supposed to be anymore. As well after cutting out all html
and ' and " etc. from $meaning{$id}.

To all the people (still) reading this: thanks for your patience with me! :)

Lex
 
B

Bob Walton

Lex said:
....

What it does: when showing the field $rec{'Text'}from one of the records of
the database it checks if words, synonyms or plurals form all records in the
database are used in this field and if so create a link to those words.
Anyway, I don't need telling you people do I? :) But just to be clear about
what I'm doning.

Now I wanted something more again (always the same...), when showing records
from another database I wanted it to happen as well. Got that working (the
links), wasn't very hard. However, I now wanted a popup screen as well
already showing the meaning of the word (taken from the meaning field). Now,
here I run into troubles and I think it's because it's still working and
trying to do things with that meaning field as well. If it would produce
just the text it works. I thought it might be ' or " as you suggested so
made sure that I got rid of them. However, that didn't do the trick and the
problem was worse than just that.

So I think $meaning{$id} in this case (underneath) has more luggage than
that what I am looking for. In this case I do not want links in this field
now, I know I want a lot and am still not capable of producing it, quite
frustrating, but hey, this is my way of learning I guess. I'll copy the code
underneath that I tried but that was giving me more than I wanted:

open (DB, "<$db_file_name_abc") or &cgierr("error in search. unable to open
database: $db_file_name_abc.\nReason: $!");
if ($db_use_flock) { flock(DB, 1); }

my %xref;
my %meaning;
while ( <DB> ) {
chomp;
my($id, $word, $plural, $synonym, $cat, $meaning) = split /\|/;
@xref{($word, $plural, $synonym, $meaning)} = ($id)x4;
$meaning{$id}=$meaning;
}

foreach my $id(keys %meaning){
foreach my $word ( $rec{'Text'} =~ /\S+/g ) {

if ($xref{$word}) {

my $newword = "<a
href=\"$db_dir_url/db.cgi?db=abc&uid=$db_uid&ID=$xref{$word}&mh=1&ww=1&view_
records=1\" class=\"abclink\"
ONMOUSEOVER=\"popup('$meaning{$id}','#ffffcc')\";
ONMOUSEOUT=\"kill()\">$word</a>";

$rec{'Text'} =~ s|\b$word\b|$newword|gs;
}
}
}

close DB;

As soon as I insert $meaning{$id} it all goes wrong, all of it, even $word
isn't what it's supposed to be anymore. As well after cutting out all html
and ' and " etc. from $meaning{$id}.


If you are running this as a CGI script, run it for debugging purposes
at the command prompt and *use the Perl debugger*. With it you can step
through your program and observe the values of variables as you go. In
the excerpt above, for example, you will note that $rec{Text} (why are
you using a hash to hold just one scalar, anyway??) never gets set to
anything because the "$word" foreach loop loops over words in
$rec{Text}, but $rec{Text} starts out empty, so there are never any
words to start with (if you would pay attention to advice and use
strict; and use warnings; you would have known that right away). Thus,
that foreach body is never executed, so $rec{Text} never gets any
content. Proper indentation of the loops would help understanding of
the code, too.

You appear to be just trying random things rather than taking a
systematic approach to your programming problem. You should sit back
and develop an overview of what you want to do, and then outline the
small steps needed to accomplish that. Then tackle each of those steps,
using the debugger to assure that each statement is accomplishing its
purpose and that you actually have the data you expect in each variable.
When you get to the end of that, you will have working code.

HTH.
....
 
L

Lex

If you are running this as a CGI script, run it for debugging purposes
at the command prompt and *use the Perl debugger*. With it you can step
through your program and observe the values of variables as you go. In
the excerpt above, for example, you will note that $rec{Text} (why are
you using a hash to hold just one scalar, anyway??) never gets set to
anything because the "$word" foreach loop loops over words in
$rec{Text}, but $rec{Text} starts out empty, so there are never any
words to start with (if you would pay attention to advice and use
strict; and use warnings; you would have known that right away). Thus,
that foreach body is never executed, so $rec{Text} never gets any
content. Proper indentation of the loops would help understanding of
the code, too.

You appear to be just trying random things rather than taking a
systematic approach to your programming problem. You should sit back
and develop an overview of what you want to do, and then outline the
small steps needed to accomplish that. Then tackle each of those steps,
using the debugger to assure that each statement is accomplishing its
purpose and that you actually have the data you expect in each variable.
When you get to the end of that, you will have working code.

Hi Bob,

you're right obviously but I think I should explain a bit here. It's not
that I don't want to learn, the opposite. But what I've got is a webserver
somewhere and a client. And 2 perl books. And the internet obviously.

I know I can install perl on my windows machine, but I reckon for my scripts
to run (at the end they have to work on a unix-like machine) with all the
path info and stuff it's not worth testing them first on my machine and than
upload and see that stuff doesn't work as it isn't a windows webserver...

To be honest: I lack the time to properly investigate this. Everybody
working with perl seems to talk about command prompt, perl debugger etc...
well it's no option for me (yet) (and to be honest, I don't want/need to use
perl from my command prompt neither). I know you're right when you say what
I need is a systematic approach, but as for now, I still know far too little
to just do it all myself. I need looking at code, trying to see what it does
and once I understand it I can pick it up myself. Most people don't learn
that way I guess, they're better off with a book. I'm crap with books. Good
to have, to search for something when I don't understand what's going on
somewhere. What I'd wish for is more time, more time to learn perl properly,
but as for now, with this site having to be finished the 19th of october...
:)

I'll go back and reread the messages and try to see if I can work it out.

I know my excuses are pretty lame ones, but I guess that's just my situation
right now and I can't change it within a few hours.

Anyway, I really am thankful for the insights you've shown.

Have fun,

Lex
 
M

Malcolm Dew-Jones

Lex ([email protected]) wrote:

: I know I can install perl on my windows machine, but I reckon for my scripts
: to run (at the end they have to work on a unix-like machine) with all the
: path info and stuff it's not worth testing them first on my machine

You're wrong.

: To be honest: I lack the time to properly investigate this.

Sometimes the shortest way there is the long way round.
 
B

Bob Walton

Lex said:
....
I know I can install perl on my windows machine, but I reckon for my scripts
to run (at the end they have to work on a unix-like machine) with all the
path info and stuff it's not worth testing them first on my machine and than
upload and see that stuff doesn't work as it isn't a windows webserver... ....


Lex

You are overestimating the changes required when running Perl on a Unix
versus Windoze platform. I, for example, have run everything associated
with your problem (and most all the others) on Windoze 98SE (I do boot
to Linux sometimes, but not usually because of Perl). The Perl
development folks have done a stellar job of making Perl as
OS-independent as possible -- for example, the / versus \ path seperator
issue is a non-issue internally in Perl -- Perl does what you mean. The
only important exception is with binary files -- you must use the
binmode() function when dealing with binary files on Windoze.

It is super easy to get and install Perl on a Windoze machine -- one
option is visiting http://www.activestate.com and downloading their
version of Perl. If you have an older version of Windoze, you may need
to also download the "Micro$loth Installer" first, but that is spelled
out in relatively clear instructions on the site. Within minutes of
completing the download you can have Perl up and going, complete with
all the standard modules and real nice HTML docs. The only caution is
to not install a newer version of Perl over top of an older version --
that doesn't seem to fly too well.

You will find it 1000% easier to debug your scripts on your local
machine than on someone else's remote web server. And you can install a
web server on your own computer to assist with debugging CGI scripts
when they are actually running in a CGI environment. Trust me, it will
save you gobs of time to take the time to install Perl on your local
machine.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top